A large amount of data is generated during steam turbine operation. In order to meet the requirements of high quality data driven by big data and simulation modeling, efficient data cleaning is very necessary. The semi-supervised data cleaning model of steam turbine is built by using the excellent nonlinear fitting ability of long and short memory layer for time series data. The model selects three boundary conditions of the unit as input to predict the cleaning data. Outliers are eliminated according to the residual difference between the predicted value and the actual value. Then, the predicted value of the model is used to fill the data to ensure the integrity of the data. The model is used to clean the data of a 650 MW unit in a power plant. To overcome the problems caused by sample imbalance in the selection of cleaning model indicators, the accuracy rate is improved and taken as the measurement index of cleaning effect. The results show that, the improved accuracy of the data cleaning model of the deep long and short memory network is higher than that of the other three common cleaning methods, which can effectively identify whether the data is abnormal, and can use the predicted value to fill the data to ensure the consistency of data before and after cleaning.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |