To construct a machine learning model based on Insulin-like Growth Factor-1 (IGF-1) and Growth Differentiation Factor-8 (GDF-8, Myostatin) for predicting sarcopenia in lung cancer patients, with the aim of improving early detection and diagnostic capabilities, providing personalized nutrition and treatment recommendations, and enhancing patients’ health status and prognosis.
A total of 263 primary lung cancer patients hospitalized at Karamay Central Hospital between October 2023 and July 2024 were selected as research subjects. Data on gender, age, BMI, IGF-1, GDF-8, interleukin-6 (IL-6), tumor necrosis factor-alpha (TNF-α), and other variables were collected. Patients were divided into a sarcopenia group and a non-sarcopenia group according to the criteria established by the Asian Working Group for Sarcopenia (AWGSOP). Univariate analysis and LASSO regression analysis were used to identify risk factors for sarcopenia in primary lung cancer patients. The selected risk factors were incorporated into the K-nearest neighbors (KNN) algorithm model, Gaussian Naive Bayes (GNB), and binary logistic regression models, using the R software. Internal validation was performed using the Bootstrap method.
A total of 263 patients were investigated, with 137 in the sarcopenia group and 126 in the non-sarcopenia group. The sarcopenia group had significantly higher proportions of alcohol consumption, IL-4, IL-6, IL-17, TNF-α, GDF-8, blood urea nitrogen, and low-density lipoprotein, while BMI, smoking, PSQI score, IGF-1, and platelet count were significantly lower than those in the non-sarcopenia group (all P<0.05). The ROC curve showed that the C-index of the KNN model was 0.936, the C-index of the GNB model was 0.935, both significantly better than the binary logistic regression model’s C-index of 0.926. The Hosmer-Lemeshow goodness-of-fit test showed that the average prediction error between the predicted and actual values of the KNN model, GNB model, and binary logistic regression model were 0.026 9, 0.018 8, and 0.028 7, respectively, with the GNB model significantly outperforming the KNN model and the binary logistic regression model. The prediction results of the GNB model were highly consistent with the observed outcomes. DCA curves demonstrated that the GNB model outperformed both the KNN model and the binary logistic regression model in predicting sarcopenia risk in lung cancer patients.
High levels of IL-6, TNF-α, and GDF-8, low levels of IGF-1 and albumin, poor sleep quality, and low BMI are independent risk factors for sarcopenia in lung cancer patients. The GNB prediction model constructed in this study significantly outperforms both the KNN model and the binary logistic regression model, providing precise and individualized predictions for sarcopenia risk in lung cancer patients. This model can offer personalized nutrition and treatment recommendations for clinical practice, improving patients’ health status and prognosis.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |