Journal of Physical Chemistry A, Vol.124, No.49, 10330-10345, 2020
Machine Learned Model for Solid Form Volume Estimation Based on Packing-Accessible Surface and Molecular Topological Fragments
We present a machine learned model for predicting the volume of a homomolecular crystal based on the single-molecule structure, implemented in the open-source Python package for Molecular Volume Estimation (PyMoVE). The model is based on two descriptors: the volume enclosed by the packing-accessible surface and molecular topological fragments. To calculate the volume enclosed by the molecular surface, we have developed a new "projected marching cubes" algorithm. The new algorithm achieves a higher accuracy with a smaller number of elements than the traditional marching cubes algorithm, the marching tetrahedron variant, and Monte Carlo methods. The packing-accessible surface is then calculated using an optimized probe radius. The molecular topological fragments are used to construct a representation that captures the bonding environments of the atoms in the molecule. Feature selection is used to determine which fragments to include in the model. The accuracy and robustness of the model may be attributed to including both geometric and chemical features. The volume enclosed by the packing-accessible surface accounts for the presence of voids and sterically hindered regions as well as for the effect of conformational changes. The molecular topological fragments account for the effect of intermolecular interactions on the packing density. The model is trained on a dataset of structures extracted from the Cambridge Structural Database. Excellent performance is demonstrated for three validation sets of unseen data.