The low temperature and high humidity environment in winter can easily cause wind turbine blades to freeze, seriously affecting the actual power output and safe operation of wind turbines. To avoid problems such as increased fatigue load and vibration of unit components caused by icing, wind farms need to implement shutdown strategies in a timely manner based on the icing situation of the blades. Therefore, accurate identification of blade icing status has become one of the key points in maintaining the safe operation of winter wind turbines. However, current ice diagnosis methods rely on a large amount of time series data for modeling and prediction. In practical work, due to equipment and working conditions, it is difficult to collect sufficient ice sample monitoring data, which leads to the widespread problem of data imbalance and has a continuous impact on the improvement of ice diagnosis accuracy. To solve this problem, this paper proposes a fusion diagnostic model based on conditional generative adversarial network (CTGAN) and light gradient boosting machine (LightGBM), aiming to achieve high-performance wind turbine blade ice diagnosis using a small number of training samples.
Firstly, based on the sliding window algorithm, new mixed features are further constructed on the basis of the original features. Secondly, the CTGAN model is used to learn the data distribution of real samples, and Nash equilibrium is achieved through adversarial training with generators and discriminators, generating new samples that are similar to real samples. Then, the synthesized samples are input into LightGBM to extract effective features and diagnose icing, and the LightGBM model is modified by introducing a focus loss function to improve its ability to distinguish confusing samples. Finally, the attribution theory based on shapley additive explanetions (SHAP) was used to analyze the factors affecting icing.
The simulation results on actual wind farm data show that the diagnostic accuracy of all algorithms has a certain improvement effect after using mixed features, and the average diagnostic accuracy of each model can reach 0.979. Due to the introduction of sample expansion algorithms, the accuracy of each model has improved to varying degrees compared to when data is lacking. When the sample imbalance rate is 30%, the accuracy of the traditional Logistic regression classification model is improved by 11.02%. At the same time, the accuracy of LightGBM (Focal Loss) is 0.982, which is close to the accuracy when the sample is sufficient. As the sample imbalance rate decreases and the actual number of ice-covered samples further decreases, the advantages of the sample expansion algorithm gradually become apparent. When the sample imbalance rate is 10%, compared to the unexpanded samples, the accuracy of Logistic regression model is improved by 13.55%. When the sample imbalance rate is 5% and the actual number of ice-covered samples is only 15, compared to the unexpanded samples, the accuracy of Logistic regression, KNN, XGBoost, and LightGBM models has improved by 35.85%, 4.52%, 9.32%, and 9.18%, respectively. This indicates that CTGAN has good sample generation ability and can effectively learn the distribution of real samples even when the sample data is small.
From the simulation analysis, the following conclusions can be drawn: (1) The mixed features constructed based on the sliding window algorithm in this paper can significantly improve the classification ability of each model. At the same time, the LightGBM model combined with mixed feature information has obvious advantages compared to other models. (2) The sample generation model CTGAN can effectively learn the distribution of real samples, and compared to other data augmentation methods, it can generate new samples that are more similar to real samples. (3) By using the Focal loss function to modify the LightGBM model, the model's ability to distinguish easily confused samples has been increased. In addition, based on the SHAP attribution theory, the importance of each icing factor was analyzed, and the quantitative impact of key features on the diagnostic results was quantified, improving the credibility of the model's diagnostic results.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |