Objective To screen the risk factors of preeclampsia and construct the predictive model of preeclampsia based on machine learning algorithm. Methods A retrospective study was conducted to collect the clinical data of 1609 hospitalized pregnant women from January 2016 to December 2018 on the big data platform of Academy of Medical Data Science of Chongqing Medical University. The 1609 cases were divided into preeclampsia group (n=291) and non-preeclampsia group (n=1318) according to the occurrence of preeclampsia during hospitalization. The clinical data of 70% patients were randomly selected as the training set (n=1126) to construct the prediction model, and the remaining 30% were used as the test set (n=483) for verification, and a consistency check between training set and test set was performed. The independent risk factors were screened by univariate analysis and logistic regression analysis, and the optimal parameters of LightGBM algorithm were searched by 5-fold cross-validation algorithm, and the prediction model was constructed based on LightGBM machine learning algorithm. Results A total of 58 indicators were collected, 13 indicators with missing rate ≥30% were excluded, and 45 indicators were finally included. Significant differences of 35 indicators existed between preeclampsia group and non-preeclampsia group (P<0.05) such as gamma-glutamyl transferase (GGT), alanine aminotrans ferase (ALT), thrombin time, aspartase transaminase (AST) and specific gravity of urine.Logistic regression analysis showed that specific gravity of urine, uric acid, hemoglobin concentration of erythrocyte, globulin, platelet distribution width, potassium ion, visiting age, family history of hypertension, systolic blood pressure, diastolic blood pressure,pulse and gestational age ≥34 weeks were independent risk factors for preeclampsia. The results of 5-fold cross-validation showed that, when num_leaves=5, max_depth=3, min_data_in_leaf=91, feature_fraction=0.8, bagging_fraction=0.6, and bagging_freq=5,the LightGBM model achieved the best effect the area under the curve (AUC), sensitivity and specificity of LightGBM model were 0.964, 84.9% and 92.7%. Conclusion The prediction model of preeclampsia based on LightGBM machine learning algorithm has a higher prediction effect, which can effectively predict the occurrence of preeclampsia in pregnant women in Chongqing, and provide decisions for clinicians.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |