Solar Energy, Vol.198, 81-92, 2020
Random forest regression for improved mapping of solar irradiance at high latitudes
Datasets from meteorological reanalyses and retrievals from satellites are the available sources of large-scale information about solar radiation. However, both the reanalyses and the satellite-based estimates can be severely biased, especially in high latitude regions. In this study, surface solar irradiance estimates from the ECMWF Reanalysis 5 (ERA5) and the Cloud, Albedo, Radiation dataset Edition 2 (CLARA-A2) were used as input to a random forest regression (RFR) model to construct a novel dataset with higher accuracy and precision than the input datasets. For daily averages of global horizontal irradiance (GHI) at Norwegian sites, CLARA-A2 and ERA5 respectively produced a root mean squared deviation (RMSD) of 17.9 Wm(-2) and 27.1 Wm(-2), a mean absolute deviation (MAD) of 11.9 Wm(-2) and 17.5 Wm(-2), and a bias of -1.5 Wm(-2) and 4.3 Wm(-2). In contrast, the proposed regression model provided an RMSD of 16.2 Wm(-2), an MAD of 10.8 Wm(-2), and a bias of 0.0 Wm(-2). This shows that the RFR model is both accurate and precise, and significantly reduces both dispersion and bias in the new dataset with respect to the constituent sources. A sky-stratification analysis was performed and it was found that the proposed model provides better estimates under all sky conditions with particular improvements in intermediate-cloudy conditions. The proposed regression model was also tested on five Swedish locations and it was found to improve surface solar irradiance estimates to a similar degree as for the Norwegian locations, thus proving its consistency under similar climatic conditions.