Applied Energy, Vol.205, 116-129, 2017
Comparative analysis of data-driven methods online and offline trained to the forecasting of grid-connected photovoltaic plant production
Actual technology improvements are contributing to DC and AC micro grid diffusion characterized by renewables photovoltaic and storage systems. Photovoltaic technologies have the advantage of a capillary distribution, but they are characterized by an intrinsic variable behavior due to continuously changing weather conditions. This drawback can be overcome by an appropriate temporal and energetic match among photovoltaic generation and storage capacity, so increasing micro grids reliability and efficiency levels. An accurate forecast of photovoltaic production can contribute to smooth photovoltaic systems intermittency problems so supporting generation and storage balance. Many different forecasting algorithms have been proposed in literature to provide long, medium and short-term predictions of photovoltaic production. A criterion for selection among them is not a priori and it is not uniquely identifiable. In this paper, the attention is focused on eleven data-driven models to obtain 12 h ahead forecast, also including some models not usually used in solar field but successfully employed in other expertise fields. This study addresses simple linear models, as Multiple Linear Regression, nonlinear models, such as Classification And Regression Tree, Model Tree M5, Extreme Learning Machines, weighted k-Nearest Neighbors, Multivariate Adaptive Regression Spline, Support Vector Machines, Bayesian Regularized Neural Networks and ensemble methods, as Random Forests, Cubist and Extreme Gradient Boosting. The goal is to compare methods characterized by different complexity levels to understand if a higher complexity model can provide better performances. Furthermore, the considered forecasting methodologies are compared applying two different training methodologies (online and offline) to identify the most performing training mode. The application of optimization algorithms permits to identify optima parameters and the optimum training dataset length for each model. After the optimization step, a statistical analysis is carried out to compare methods forecasting performances for the production of a 1 kW(p) photovoltaic plant installed at the ENEA Research Center of Portici. The case study results demonstrate promising forecasting performances applying the online training mode. Among all studied methods, Support Vector Machines, M5 and the Cubist can assure minima prediction errors and satisfying accuracy with optima dataset lengths. Cubist and M5 represent the best performing models since they are able to minimize prediction errors both in case of optima and minima training datasets.
Keywords:Photovoltaic generation;Machine learning;Performance evaluation;Prediction error;Training dataset optimization