Extraction of Protein Conformational Modes from Distance Distributions Using Structurally Imputed Bayesian Data Augmentation

Sun X; Morrell TE; Yang H

Journal of Physical Chemistry B, Vol.120, No.40, 10469-10482, 2016

DOI10.1021/acs.jpcb.6b07767 Export Citation

Extraction of Protein Conformational Modes from Distance Distributions Using Structurally Imputed Bayesian Data Augmentation

Sun X, Morrell TE, Yang H

Protein conformational changes are known to play important roles in assorted biochemical and biological processes, Driven by thermal motions of surrounding solvent molecules, such a structural remodeling often occurs stochastically. Yet, regardless of how random the conformational: reconfiguration may appear; it-could, in principle be described by a linear combination of a set of orthogonal modes which, in turn, are contained in the intramolecular distance;distributions. The central challenge is how to obtain the distribution. This contribution proposes a:Bayesian data-augmentation scheme to extract the predominant modes from only few distance: distributions, be they from computational sampling or directly from experiments such as single-molecule Forster-type resonance energy transfer (smFRET). The inference of the complete protein structure from insufficient data was recognized as isomorphic to the missing-data problem in Bayesian statistical learning. Using smFRET data as an example, the missing coordinates were deduced, given protein structural constraints and multiple but limited number of smFRET distances; the Eoltzmaru weighing of each inferred protein structure was then evaluated using computational modeling to numerically construct the posterior density-for the global protein conformation. The conformational modes were then-determined from the iteratively converged overall conformational distribution wing principal component analysis. Two examples were presented to illustrate these basic ideas as well as their practical implementation. The scheme described herein was based on the theory behind the powerful Tanner-Wang algorithm that guarantees convergence to the true posterior density. However; instead of assuming a-Mathematical model to calculate the likelihood as in conventional statistical inference, here the protein structure was treated: as a statistical parameter and was imputed from the numerical likelihood function based on structural information, a probability model free method. The framework put. forth here is anticipated to be generally applicable, offering a new way to articulate protein conformational changes in a quantifiable manner.