Canadian Journal of Chemical Engineering, Vol.86, No.5, 838-858, 2008
Treatment of missing values in process data analysis
Process data suffer from many different types of imperfections. For example, bad data due to sensor problems, multi-rate sampling, outliers, compressed data etc. Since most modelling and data analysis methods are developed to analyze regularly sampled and well conditioned data sets there is a need for pre-treatment of data. Traditionally data conditioning or pre-treatment has been done without taking into account the end use of the data, for example, univariate methods have been used to interpolate bad data even when the intended end use of data is for multivariate analysis. In this paper we consider the pre-treatment and data analysis as a collective problem and propose data conditioning methods in a multivariate framework. We first review classical process data analysis methods and acclaimed missing data handling techniques used in statistical surveys and biostatistics. The applications of these acclaimed missing data techniques are demonstrated in three different instances: (i) principal components analysis (PCA) is extended in data augmentation (DA) framework for dealing with missing values, (ii) iterative missing data technique is used to synchronize uneven length batch process data, and (iii) PCA based iterative missing data technique is used to restore the correlation structure of compressed data.