Biochemical and Biophysical Research Communications, Vol.471, No.1, 226-232, 2016
Identification of moisture content in tobacco plant leaves using outlier sample eliminating algorithms and hyperspectral data
Fast identification of moisture content in tobacco plant leaves plays a key role in the tobacco cultivation industry and benefits the management of tobacco plant in the farm. In order to identify moisture content of tobacco plant leaves in a fast and nondestructive way, a method involving Mahalanobis distance coupled with Monte Carlo cross validation(MD-MCCV) was proposed to eliminate outlier sample in this study. The hyperspectral data of 200 tobacco plant leaf samples of 20 moisture gradients were obtained using FieldSpc (R) 3 spectrometer. Savitzky-Golay smoothing(SG), roughness penalty smoothing(RPS), kernel smoothing(KS) and median smoothing(MS) were used to preprocess the raw spectra. In addition, Mahalanobis distance(MD), Monte Carlo cross validation(MCCV) and Mahalanobis distance coupled to Monte Carlo cross validation(MD-MCCV) were applied to select the outlier sample of the raw spectrum and four smoothing preprocessing spectra. Successive projections algorithm (SPA) was used to extract the most influential wavelengths. Multiple Linear Regression (MLR) was applied to build the prediction models based on preprocessed spectra feature in characteristic wavelengths. The results showed that the preferably four prediction model were MD-MCCV-SG (R-p(2) = 0.8401 and RMSEP = 0.1355), MD-MCCV-RPS (R-p(2) = 0.8030 and RMSEP = 0.1274), MD-MCCV-KS (R-p(2) = 0.8117 and RMSEP = 0.1433), MD-MCCV-MS (R-p(2) = 0.9132 and RMSEP = 0.1162). MD-MCCV algorithm performed best among MD algorithm, MCCV algorithm and the method without sample pretreatment algorithm in the eliminating outlier sample from 20 different moisture gradients of tobacco plant leaves and MD-MCCV can be used to eliminate outlier sample in the spectral preprocessing. (C) 2016 Elsevier Inc. All rights reserved.