Energy Conversion and Management, Vol.150, 904-913, 2017
Diagnostic information system dynamics in the evaluation of machine learning algorithms for the supervision of energy efficiency of district heating-supplied buildings
Modern ways of exploring the diagnostic knowledge provided by data mining and machine learning raise some concern about the ways of evaluating the quality of output knowledge, usually represented by information systeins. Especially in district heating, the stationarity of efficiency models, and thus the relevance of diagnostic classification system, cannot be ensured due to the impact of social, economic or technological changes, which are hard to identify or predict. Therefore, data mining and machine learning have become an attractive strategy for automatically and continuously absorbing such dynamics. This paper presents a new method of evaluation and comparison of diagnostic information systems gathered algorithmically in district heating efficiency supervision based on exploring the evolution of information system and analyzing its dynamic features. The process of data mining and knowledge discovery was applied to the data acquired from district heating substations' energy meters to provide the automated discovery of diagnostic knowledge base necessary for the efficiency supervision of district heating-supplied buildings. The implemented algorithm consists of several steps of processing the billing data, including preparation, segmentation, aggregation and knowledge discovery stage, where classes of abstract models representing energy efficiency constitute an information system representing diagnostic knowledge about the energy efficiency of buildings favorably operating under similar climate conditions and supplied from the same district heating network. The authors analyzed the evolution of a series of information systems originating from the same knowledge discovery algorithm applied to a sequence of energy consumption-related data. Specifically, the rough sets theory was applied to describe the knowledge base and measure the uncertainty of machine learning predictions of current classification based on a past knowledge base. Fluctuations of diagnostic class membership were identified and provided for the differentiation between returning and novel fault detections, thus introducing the qualities of information system uncertainty and its sustainability. The usability of the new method was demonstrated in the comparison of results for exemplary data mining algorithms implemented on real data from over one thousand buildings. (C) 2017 Elsevier Ltd. All rights reserved.