To systematically compare the performance of seven machine learning algorithms in constructing prediction models for tuberculosis (TB) recurrence among newly treated patients in Kashgar, Xinjiang, providing data support for optimizing recurrence intervention strategies in high-burden areas.
We analyzed 69 476 successfully treated new TB patients from 2016 to 2022 in Kashgar, with follow-up through 2023. Independent predictors were selected through multivariate logistic regression. Seven models (logistic regression, decision tree, random forest, multilayer perceptron, XGBoost, LightGBM, and elastic net) were developed and validated. The optimal model was interpreted using SHapley Additive exPlanations (SHAP).
Among 69 476 cases, 9 444 (13.59%) experienced recurrence by 2023. Fourteen independent predictors were identified. The seven models showed AUC values ranging from 0.705 to 0.762 in the training set, with the decision tree model performing best (AUC=0.762, 95%CI: 0.758-0.766) and demonstrating good calibration. SHAP analysis revealed sputum culture results at diagnosis, local TB burden, and treatment modality as the top three predictive factors.
The decision tree model based on routine surveillance data shows high predictive performance for TB recurrence, with interpretable features that can facilitate early identification of high-risk individuals in clinical practice.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |