Exposure to environmental cadmium increases the health risk of residents. Early urine metabolic detection using high-resolution mass spectrometry and machine learning algorithms would be advantageous to predict the adverse health effects. Here, we conducted machine learning approaches to screen potential biomarkers under cadmium exposure in 403 urine samples. In positive and negative ionization mode, 4207 and 3558 features were extracted, respectively. We compared seven machine learning algorithms and found that the extreme gradient boosting (XGBoost) and random forest (RF) classifiers showed better accuracy and predictive performance than others. Following 5-fold cross-validation, the value of area under curve (AUC) was both 0.93 for positive and negative ionization modes in XGBoost classifier. In the RF classifier, AUC were 0.80 and 0.84 for positive and negative ionization modes, respectively. We then identified a biomarker panel based on XGBoost and RF classifiers. The incorporation of machine learning models into urine analysis using high-resolution mass spectrometry could allow a convenient assessment of cadmium exposure.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |