Chemical Physics Letters, Vol.542, 134-137, 2012
The average numbers of outliers over groups of various splits into training and test sets: A criterion of the reliability of a QSPR? A case of water solubility
The validation of quantitative structure-property/activity relationships (QSPR/QSAR) is an important challenge of modern theoretical chemistry. Analysis of QSPRs which are obtained with various distribution into sub-systems of training and of testing can be a useful approach to estimate reliability of QSPR predictions. The balance of correlation is an approach for the building up of QSPR with using three components of available data: (a) sub-training set (developer), (b) calibration set (critic), and (c) test set (estimator). Computational experiments have shown that the probabilistic interdependence between the distribution of available data into sub-training set, calibration set, and test set and the average numbers of outliers in the test set exists. (C) 2012 Elsevier B.V. All rights reserved.