Industrial & Engineering Chemistry Research, Vol.59, No.25, 11582-11595, 2020
Chemoinformatic Investigation of the Chemistry of Cellulose and Lignin Derivatives in Hydrous Pyrolysis
We present a data-driven approach to identifying the reaction network of the dominant chemistry in complex mixtures using model compounds representative of cellulose and lignin chemistry that are processed using hydrous pyrolysis. We present two methods for the identification of pseudocomponents: self-modeling multivariate curve resolution, which is a nonnegative matrix factorization method, and Bayesian hierarchical clustering. The pseudocomponents are identified from spectroscopic data from two sources: Fourier transform infrared spectroscopy and H-1 NMR spectroscopy. The data from the two sources is combined using a simple data combination method. Once pseudocomponents have been identified, Bayesian networks are used to identify directed pathways between the components, resulting in a proposed hypothesis for the reaction network or mechanism. We validate the methods by showing consistency of the derived reaction networks with the known chemistry of cellulose, lignin, and their derivatives and demonstrate the importance of data fusion in developing believable reaction networks.