Journal of the American Chemical Society, Vol.142, No.8, 3814-3822, 2020
A Universal Machine Learning Algorithm for Large-Scale Screening of Materials
Application of machine learning (ML) methods for the determination of the gas adsorption capacities of nanomaterials, such as metal-organic frameworks (MOF), has been extensively investigated over the past few years as a computationally efficient alternative to time-consuming and computationally demanding molecular simulations. Depending on the thermodynamic conditions and the adsorbed gas, ML has been found to provide very accurate results. In this work, we go one step further and we introduce chemical intuition in our descriptors by using the "type" of the atoms in the structure, instead of the previously used building blocks, to account for the chemical character of the MOF. ML predictions for the methane and carbon dioxide adsorption capacities of several tens of thousands of hypothetical MOFs are evaluated at various thermodynamic conditions using the random forest algorithm. For all cases examined, the use of atom types instead of building blocks leads to significantly more accurate predictions, while the number of MOFs needed for the training of the ML algorithm in order to achieve a specified accuracy can be reduced by an order of magnitude. More importantly, since practically there are an unlimited number of building blocks that materials can be made of but a limited number of atom types, the proposed approach is more general and can be considered as universal. The universality and transferability was proved by predicting the adsorption properties of a completely different family of materials after the training of the ML algorithm in MOFs.