Energy & Fuels, Vol.34, No.7, 8195-8205, 2020
Evaluating the Benefits of Data Fusion and PARAFAC for the Chemometric Analysis of FT-ICR MS Data Sets from Gas Oil Samples
Advanced characterization of the products of the hydrotreatment of gas oils is of high interest for refiners and can be achieved using ultrahigh resolution mass spectrometry (FT-ICR MS). However, the analysis of gas oil samples by FT-ICR MS generates complex data sets with numerous variables whose exhaustive analysis requires the use of multivariate methods. Relevant information about nitrogen and sulfur compounds contained in several industrial gas oils are obtained by using three different ionization modes that are electrospray ionization (ESI) used in positive and negative polarities and atmospheric pressure photoionization (APPI) used in positive polarity. For data sets generated for a single ionization mode, classical multivariate methods such as Principal Component Analysis (PCA) are commonly used. When the key information is spread into several ionization modes and thus into several data sets, a data fusion approach is highly interesting to simultaneously explore these data sets and can be followed by Parallel Factor analysis (PARAFAC). Nevertheless, many more variables are simultaneously considered when data fusion is performed and the sensitivity of PARAFAC and its ability to extract the most relevant variables compared to classical multivariate methods has not been assessed yet in the framework of FT-ICR MS. In this paper, a comparison of the classical data analysis (PCA) approach and the data fusion combined with the PARAFAC analysis approach is presented. The results have shown that applying PARAFAC on fused data sets is highly sensitive and able to put forward features and variables that are individually identified through classical data analysis with greater ease of implementation and interpretation of results. As an example, dibenzothiophenes and carbazole families (DBE 9) have explained most of the variance between samples and remain the most refractory compounds in hydrotreated samples. A significant difference in alkylation between the different types of gas oils has also been spotted. This paper validates the power and efficiency of this approach to explore complex data sets simultaneously without any loss of significant information.