Energy & Fuels, Vol.31, No.1, 170-178, 2017
Discriminating Lacustrine and Marine Organic Matter Depositional Paleoenvironments of Brazilian Crude Oils Using Comprehensive Two-Dimensional Gas Chromatography-Quadrupole Mass Spectrometry and Supervised Classification Chemometric Approaches
The knowledge about the organic matter predominant depositional paleoenvironment in which crude oils gave rise is essential for the extensive understanding of their corresponding geochemical features. This task is laborious and conventionally performed through the extensive search for classical biomarkers contained in specific saturate, aromatic, resin, and asphaltene (SARA) fractions of the crude oils. In this work, the well-established analytical technique comprehensive two-dimensional gas chromatography-quadrupole mass spectrometry (GC X GC-QMS) was used to analyze the first two fractions (maltenes) of crude oils, performing the chromatographic data treatment with chemometrics to evaluate the lacustrine or marine predominant origin of the organic matter depositional paleoenvironment of the crude oils. In this approach, the extraction of the target information contained in the GC x GC-QMS data for discriminating between crude oils derived from lacustrine or marine organic matter environments was evaluated using the supervised classification chemometrics kth nearest neighbor (k-NN), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), partial least squares discriminant analysis (PLS-DA), and support vector machine discriminant analysis (SVM-DA). The methods were compared when predicting external samples using double cross-validation, which is a more appropriate approach to attest to the performance of different methods. Additionally, the main advantages and pitfalls of the linear/nonlinear classification approaches to handle the GC X GC-QMS data when classifying the crude oils were extensively discussed considering the sample individual prediction uncertainties provided by the double cross-validation results. The most important variable for discrimination was obtained while interpreting the Y-correlated loadings from a representative orthogonal PLS-DA model performed with the data set. SVM-DA outperformed the remaining methods for correctly classifying the crude oils (performance rank: SVM-DA > PLS-DA > QDA > LDA > k-NN), because the relevant variance in the GC x GC-QMS data for the proper discrimination of the crude oils according to their predominant organic matter paleoenvironments was not effectively explained exclusively by multilinear approaches.