Energy & Fuels, Vol.31, No.6, 5828-5839, 2017
Multiobjective Feature Selection Approach to Quantitative Structure Property Relationship Models for Predicting the Octane Number of Compounds Found in Gasoline
Octane number is one of the most important factors for determining the price of gasoline. The increasing popularity of molecular models in petroleum refining has made predicting key properties for pure components more important. In this paper, quantitative structure property relationship (QSPR) models are developed to predict the research octane number (RON) and motor octane number (MON) of pure components using two databases. The databases include oxygenated and nitrogen-containing compounds as well as hydrocarbons collected from published data. QSPR models are widely utilized because they effectively characterize molecular structures with a variety of descriptors, especially different isomeric structures. Feature subset selection is an important step for increasing the performance and simplifying the complexity of a QSPR model by removing redundant and irrelevant descriptors. A two-step feature selection method is developed to identify appropriate subsets of descriptors from a multiobjective perspective: (1) a filter using the Boruta algorithm to remove noise features and (2) a multiobjective wrapper to simultaneously minimize the number of features and maximize the model accuracy. A multiobjective wrapper is developed to account for both the complexity and generalizability of models to resist overfitting, which commonly occurs when using a single-objective feature selection method. In the proposed procedure, optimized subsets of descriptors are used to build the final QSPR models to predict the RON and MON of pure components via support vector machine regression. The proposed models are competitive with other models found in the literature.