Article(id=1208054528700683097, tenantId=1146029695717560320, journalId=1146123222451335185, issueId=1208054446576215005, articleNumber=1671-1807(2025)13-0241-05, orderNo=null, doi=null, pmid=null, cstr=null, oa=null, hot=null, price=null, onlineType=0, articleFormat=0, articleType=null, articleTypeStr=research-article, receivedDate=1735315200000, receivedDateStr=2024-12-28, revisedDate=null, revisedDateStr=null, acceptedDate=null, acceptedDateStr=null, onlineDate=1765952244210, onlineDateStr=2025-12-17, pubDate=1752076800000, pubDateStr=2025-07-10, doiRegisterDate=null, doiRegisterDateStr=null, onlineIssueDate=1765952244210, onlineIssueDateStr=2025-12-17, onlineJustAcceptDate=null, onlineJustAcceptDateStr=null, onlineFirstDate=null, onlineFirstDateStr=null, sourceXml=null, magXml=null, createTime=1765952244210, creator=13701087609, updateTime=1765952244210, updator=13701087609, issue=Issue{id=1208054446576215005, tenantId=1146029695717560320, journalId=1146123222451335185, year='2025', volume='25', issue='13', pageStart='1', pageEnd='310', issueExtLink='null', onlineDate='null', pubDate='null', beforeIssueId=null, nextIssueId=null, price=null, status=1, issueComplete=1, articleOrder=1, issueType=-1, specialIssue=null, createTime=1765952224630, creator=13701087609, updateTime=1765952288340, updator=13701087609, preIssue=null, nextIssue=null, ext={EN=IssueExt(id=1208054713870815567, tenantId=1146029695717560320, journalId=1146123222451335185, issueId=1208054446576215005, language=EN, specialIssueTitle=, coverIllustrator=null, specialIssueEditor=, specialIssueAbout=), CN=IssueExt(id=1208054713870815568, tenantId=1146029695717560320, journalId=1146123222451335185, issueId=1208054446576215005, language=CN, specialIssueTitle=, coverIllustrator=null, specialIssueEditor=, specialIssueAbout=)}, issueFiles=null}, startPage=241, endPage=245, ext={EN=ArticleExt(id=1208054529086559071, articleId=1208054528700683097, tenantId=1146029695717560320, journalId=1146123222451335185, language=EN, title=Customer Churn Prediction Based on BO-Stacking Ensemble Learning, columnId=1151877663716159826, journalTitle=Science Technology and Industry, columnName=Enterprise Application, runingTitle=null, highlight=null, articleAbstract=

To enhance the accuracy of customer churn prediction, an improved Stacking ensemble learning method with Bayesian optimization(BO) incorporated was introduced. First, base learners were selected based on their predictive performance and inter-model correlations. Noticing the fact that the performance variation among base learners was neglected in the traditional Stacking methods, the Bayesian optimization was introduced to fine-tune the weights of each base learner for minimizing prediction errors. Finally, the weighted predictions from the base learners were combined, and the Logistic Regression serves as the meta-learner for the final prediction. The results demonstrate that the proposed BO-Stacking model outperforms both the single models and the traditional Stacking methods in terms of recall rate, F1-score, and AUC(area under the curve) value, which validates the effectiveness of the proposed approach. This provides a reliable reference for enterprises to develop effective customer retention strategies.

, correspAuthors=null, authorNote=null, correspAuthorsNote=null, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=null, magXml=null, pdfUrl=null, pdf=null, pdfFileSize=null, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=null, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=null, mapNumber=null, authorCompany=null, fund=null, authors=null, authorsList=Yu GENG), CN=ArticleExt(id=1208054529837339511, articleId=1208054528700683097, tenantId=1146029695717560320, journalId=1146123222451335185, language=CN, title=基于BO-Stacking集成学习的客户流失预测, columnId=1151877663883931992, journalTitle=科技和产业, columnName=企业应用, runingTitle=null, highlight=null, articleAbstract=

为了提高客户流失预测的准确性,提出一种基于贝叶斯优化算法(BO)的改进Stacking集成学习方法。首先,依据模型的预测性能和相关性确定基学习器的种类;然后,针对传统的Stacking方法中忽略基学习器间差异性的缺陷,引入贝叶斯优化算法来精细地调整各基学习器的权重,以降低预测误差;最后,将各基学习器的预测结果进行加权组合,并选用Logistic回归作为元学习器进行最终预测。结果显示,相较于单一模型和传统的Stacking方法,所提出的BO-Stacking模型在召回率、F1-score和AUC(敏感度曲线下方的面积)上均表现最佳,验证了所提方法的有效性,可为企业制定有效的客户保留策略提供参考。

, correspAuthors=null, authorNote=null, correspAuthorsNote=null, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=WrN2rX9+T/xwCBjPb/diqw==, magXml=J3vjOSwv+eu8L0gOumEmaw==, pdfUrl=null, pdf=DqBBEYrfJhNr/QpX4nefRQ==, pdfFileSize=1747619, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=wFAa7/Au3Q93hj5zl/Hc8w==, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=LvTdqaakVjOTqitCvDKOtA==, mapNumber=null, authorCompany=null, fund=null, authors=

耿宇(2000—),女,安徽滁州人,硕士研究生,研究方向为数据分析中的统计方法及应用。

, authorsList=耿宇)}, authors=[Author(id=1208459453985304671, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1208459454148882535, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, authorId=1208459453985304671, language=EN, stringName=Yu GENG, firstName=Yu, middleName=null, lastName=GENG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=School of Mathematics and Physics, Anhui Jianzhu University, Hefei 230601, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1208459454266323053, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, authorId=1208459453985304671, language=CN, stringName=耿宇, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=安徽建筑大学数理学院, 合肥 230601, bio={"content":"

耿宇(2000—),女,安徽滁州人,硕士研究生,研究方向为数据分析中的统计方法及应用。

"}, bioImg=null, bioContent=

耿宇(2000—),女,安徽滁州人,硕士研究生,研究方向为数据分析中的统计方法及应用。

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1208459453792366676, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, xref=null, ext=[AuthorCompanyExt(id=1208459453817532502, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, companyId=1208459453792366676, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=School of Mathematics and Physics, Anhui Jianzhu University, Hefei 230601, China), AuthorCompanyExt(id=1208459453834309720, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, companyId=1208459453792366676, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=安徽建筑大学数理学院, 合肥 230601)])])], keywords=[Keyword(id=1208459454488621187, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=EN, orderNo=1, keyword=Bayesian optimization(BO)), Keyword(id=1208459454589284491, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=EN, orderNo=2, keyword=Stacking algorithm), Keyword(id=1208459454723502231, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=EN, orderNo=3, keyword=ensemble learning), Keyword(id=1208459454824165533, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=EN, orderNo=4, keyword=customer churn prediction), Keyword(id=1208459455025492139, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=CN, orderNo=1, keyword=贝叶斯优化算法(BO)), Keyword(id=1208459455180681395, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=CN, orderNo=2, keyword=Stacking算法), Keyword(id=1208459455310704827, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=CN, orderNo=3, keyword=集成学习), Keyword(id=1208459458385129672, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=CN, orderNo=4, keyword=客户流失预测)], refs=[Reference(id=1208459464269738454, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, doi=null, pmid=null, pmcid=null, year=2022, volume=22, issue=2, pageStart=585, pageEnd=592, url=null, language=null, rfNumber=[1], rfOrder=0, authorNames=周艳聪, 郝园媛, journalName=科学技术与工程, refType=null, unstructuredReference=周艳聪, 郝园媛. 基于机器学习的运营商客户行为分析[J]. 科学技术与工程, 2022, 22(2): 585-592., articleTitle=基于机器学习的运营商客户行为分析, refAbstract=null), Reference(id=1208459464387178976, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, doi=null, pmid=null, pmcid=null, year=2022, volume=14, issue=3, pageStart=1, pageEnd=17, url=null, language=null, rfNumber=[2], rfOrder=1, authorNames=ZHANG T, MORO S, RAMOS R F, journalName=Future Internet, refType=null, unstructuredReference=ZHANG T, MORO S, RAMOS R F. Adata-driven approach to improve customer churn prediction based on telecom customer segmentation[J]. Future Internet, 2022, 14(3): 1-17., articleTitle=Adata-driven approach to improve customer churn prediction based on telecom customer segmentation, refAbstract=null), Reference(id=1208459464500425193, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, doi=null, pmid=null, pmcid=null, year=2020, volume=7, issue=30, pageStart=e14, pageEnd=null, url=null, language=null, rfNumber=[3], rfOrder=2, authorNames=SWETHA P, DAYANANDA R B, journalName=EAI Endorsed Transactions on Energy Web, refType=null, unstructuredReference=SWETHA P, DAYANANDA R B. Improvised_XGBoost machine learning algorithm for customer churn prediction[J]. EAI Endorsed Transactions on Energy Web, 2020, 7(30): e14., articleTitle=Improvised_XGBoost machine learning algorithm for customer churn prediction, refAbstract=null), Reference(id=1208459464638837236, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, doi=null, pmid=null, pmcid=null, year=2021, volume=null, issue=null, pageStart=1, pageEnd=7, url=null, language=null, rfNumber=[4], rfOrder=3, authorNames=SENTHAN P, RATHNAYAKA R, KUHANESWARAN B, journalName=IEEE International IOT, Electronics and Mechatronics Conference, refType=null, unstructuredReference=SENTHAN P, RATHNAYAKA R, KUHANESWARAN B, et al. Development of churn prediction model using XGboost-telecommunication industry in sri lanka[C]// IEEE International IOT, Electronics and Mechatronics Conference. Toronto: IEEE, 2021: 1-7., articleTitle=Development of churn prediction model using XGboost-telecommunication industry in sri lanka, refAbstract=null), Reference(id=1208459464756277760, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, doi=null, pmid=null, pmcid=null, year=2021, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[5], rfOrder=4, authorNames=杨洪岩, journalName=数据挖掘技术在通信用户流失预警中的研究, refType=null, unstructuredReference=杨洪岩. 数据挖掘技术在通信用户流失预警中的研究[D]. 沈阳: 辽宁大学, 2021., articleTitle=null, refAbstract=null), Reference(id=1208459464873718288, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, doi=null, pmid=null, pmcid=null, year=2022, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[6], rfOrder=5, authorNames=王变霞, journalName=基于Stacking模型融合的银行信用卡客户流失预测, refType=null, unstructuredReference=王变霞. 基于Stacking模型融合的银行信用卡客户流失预测[D]. 兰州: 兰州大学, 2022., articleTitle=null, refAbstract=null), Reference(id=1208459464999547415, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, doi=null, pmid=null, pmcid=null, year=null, volume=null, issue=null, pageStart=1, pageEnd=16, url=null, language=null, rfNumber=[7], rfOrder=6, authorNames=刘梅, 郑立君, 段永良, journalName=计算机工程与应用, refType=null, unstructuredReference=刘梅, 郑立君, 段永良, 等. PCA+GWO集成特征选择和模型堆叠的客户流失预测[J/OL]. 计算机工程与应用, 1-16[2024-11-05]. http: //kns. cnki. net/kcms/detail/11.2127.TP. 20240823.1756.009.html., articleTitle=PCA+GWO集成特征选择和模型堆叠的客户流失预测, refAbstract=null), Reference(id=1208459465154736675, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, doi=null, pmid=null, pmcid=null, year=2015, volume=104, issue=1, pageStart=148, pageEnd=175, url=null, language=null, rfNumber=[8], rfOrder=7, authorNames=SHAHRIARI B, SWERSKY K, WANG Z, journalName=Proceedings of the IEEE, refType=null, unstructuredReference=SHAHRIARI B, SWERSKY K, WANG Z, et al. Taking the human out of the loop: a review of Bayesian optimization[J]. Proceedings of the IEEE, 2015, 104(1): 148-175., articleTitle=Taking the human out of the loop: a review of Bayesian optimization, refAbstract=null), Reference(id=1208459465356063275, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, doi=null, pmid=null, pmcid=null, year=2024, volume=14, issue=1, pageStart=1, pageEnd=21, url=null, language=null, rfNumber=[9], rfOrder=8, authorNames=KHOSHKROUD A, SANI H P, AAJAMI M, journalName=Buildings, refType=null, unstructuredReference=KHOSHKROUD A, SANI H P, AAJAMI M. Stacking ensemble-based machine learning model for predicting deterioration components of steel w-section beams[J]. Buildings, 2024, 14(1): 1-21., articleTitle=Stacking ensemble-based machine learning model for predicting deterioration components of steel w-section beams, refAbstract=null), Reference(id=1208459465469309494, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, doi=null, pmid=null, pmcid=null, year=2016, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[10], rfOrder=9, authorNames=周志华, journalName=机器学习, refType=null, unstructuredReference=周志华. 机器学习[M]. 北京: 清华大学出版社, 2016., articleTitle=null, refAbstract=null)], funds=[Fund(id=1208459463984525758, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, awardId=2022AH050247, language=CN, fundingSource=安徽省高等学校科学研究重点项目(2022AH050247), fundOrder=null, country=null)], companyList=[AuthorCompany(id=1208459453792366676, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, xref=null, ext=[AuthorCompanyExt(id=1208459453817532502, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, companyId=1208459453792366676, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=School of Mathematics and Physics, Anhui Jianzhu University, Hefei 230601, China), AuthorCompanyExt(id=1208459453834309720, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, companyId=1208459453792366676, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=安徽建筑大学数理学院, 合肥 230601)])], figs=[ArticleFig(id=1208459458590650580, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=EN, label=null, caption=null, figureFileSmall=loBg6hThoX2B8fwLwvoHZw==, figureFileBig=8xHMmWPpFiAnVWMA2NEDQw==, tableContent=null), ArticleFig(id=1208459458712285408, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=CN, label=图1, caption=交叉验证过程, figureFileSmall=loBg6hThoX2B8fwLwvoHZw==, figureFileBig=8xHMmWPpFiAnVWMA2NEDQw==, tableContent=null), ArticleFig(id=1208459458884251890, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=EN, label=null, caption=null, figureFileSmall=AfF2ovsxjJkBE2lwUfs1Sg==, figureFileBig=ECteBBJmtzScSQiTkuthyw==, tableContent=null), ArticleFig(id=1208459459077189891, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=CN, label=图2, caption=各模型相关性分析, figureFileSmall=AfF2ovsxjJkBE2lwUfs1Sg==, figureFileBig=ECteBBJmtzScSQiTkuthyw==, tableContent=null), ArticleFig(id=1208459459211407629, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=EN, label=null, caption=null, figureFileSmall=null, figureFileBig=null, tableContent=
算法 优点 缺点
GBDT 高鲁棒性和适应性;通过弱分类器的级联提升整体性能 稳定性较低;难以并行化
XGBoost 支持自动并行计算;通过引入正则化项减少过拟合风险;支持稀疏数据、自定义损失函数等功能 仅接受特定格式的数据作为输入;对噪声敏感
CatBoost 类别特征处理能力优秀;通过排序提升策略减少偏差 解释性较差;训练时间较长
RF 抗过拟合能力强;算法简单易于实现 解释性较差;对噪声敏感;训练速度较慢
), ArticleFig(id=1208459459324653845, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=CN, label=表1, caption=

各算法优缺点

, figureFileSmall=null, figureFileBig=null, tableContent=
算法 优点 缺点
GBDT 高鲁棒性和适应性;通过弱分类器的级联提升整体性能 稳定性较低;难以并行化
XGBoost 支持自动并行计算;通过引入正则化项减少过拟合风险;支持稀疏数据、自定义损失函数等功能 仅接受特定格式的数据作为输入;对噪声敏感
CatBoost 类别特征处理能力优秀;通过排序提升策略减少偏差 解释性较差;训练时间较长
RF 抗过拟合能力强;算法简单易于实现 解释性较差;对噪声敏感;训练速度较慢
), ArticleFig(id=1208459459484037414, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=EN, label=null, caption=null, figureFileSmall=null, figureFileBig=null, tableContent=
变量 预测流失 预测正常 合计
实际流失 True Positive
(TP)
False Negative
(FN)
TP+FN
实际正常 False Positive
(FP)
True Negative
(TN)
FP+TN
合计 TP+FP FN+TN TP+FN+FP+TN
), ArticleFig(id=1208459459597283635, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=CN, label=表2, caption=

混淆矩阵

, figureFileSmall=null, figureFileBig=null, tableContent=
变量 预测流失 预测正常 合计
实际流失 True Positive
(TP)
False Negative
(FN)
TP+FN
实际正常 False Positive
(FP)
True Negative
(TN)
FP+TN
合计 TP+FP FN+TN TP+FN+FP+TN
), ArticleFig(id=1208459459697946945, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=EN, label=null, caption=null, figureFileSmall=null, figureFileBig=null, tableContent=
变量 变量含义 系数
SeniorCitizen 是否是老年人 0.099 876 30
Partner 是否使用电子账单 -0.067 173 64
Dependents 有无合作伙伴 -0.020 995 43
TotalCharges 总费用 -1.847 723 47
InternetService_
Fiber.optic
互联网服务_光纤
线路
0.899 771 20
OnlineSecurity_No 网络安全服务_无 0.495 815 27
OnlineBackup_No 在线备份服务_无 0.126 459 09
DeviceProtection_No 设备保护功能_无 0.030 051 51
TechSupport_No 技术支持功能_无 0.378 167 63
StreamingTV_Yes 流媒体电视功能_有 0.121 028 82
StreamingMovies_Yes 流媒体电影功能_有 0.131 366 03
Contract_One.year 合同期限_1年 -0.723 544 21
Contract_Two.year 合同期限_2年 -1.228 023 50
PaymentMethod_
Electronic.check
支付方式_电子支票 0.334 607 56
), ArticleFig(id=1208459459832164680, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=CN, label=表3, caption=

最优λ下的系数

, figureFileSmall=null, figureFileBig=null, tableContent=
变量 变量含义 系数
SeniorCitizen 是否是老年人 0.099 876 30
Partner 是否使用电子账单 -0.067 173 64
Dependents 有无合作伙伴 -0.020 995 43
TotalCharges 总费用 -1.847 723 47
InternetService_
Fiber.optic
互联网服务_光纤
线路
0.899 771 20
OnlineSecurity_No 网络安全服务_无 0.495 815 27
OnlineBackup_No 在线备份服务_无 0.126 459 09
DeviceProtection_No 设备保护功能_无 0.030 051 51
TechSupport_No 技术支持功能_无 0.378 167 63
StreamingTV_Yes 流媒体电视功能_有 0.121 028 82
StreamingMovies_Yes 流媒体电影功能_有 0.131 366 03
Contract_One.year 合同期限_1年 -0.723 544 21
Contract_Two.year 合同期限_2年 -1.228 023 50
PaymentMethod_
Electronic.check
支付方式_电子支票 0.334 607 56
), ArticleFig(id=1208459459928633680, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=EN, label=null, caption=null, figureFileSmall=null, figureFileBig=null, tableContent=
模型 最佳超参数 模型 最佳超参数
XGBoost n_estimators=100 CatBoost max_depth=7
max_depth=9 learning_rate=0.16
min_child_weight=3 l2_leaf_reg=3
Subsample=0.8 Iterations=500
learning_rate=0.54 Subsample=0.6
GBDT n_estimators=50 RF n_estimators=50
max_depth=5 max_depth=9
Subsample=1 min_samples_split=1
learning_rate=0.05 min_samples_leaf=2
), ArticleFig(id=1208459460025102683, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=CN, label=表4, caption=

不同优化算法下各模型主要超参数取值

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 最佳超参数 模型 最佳超参数
XGBoost n_estimators=100 CatBoost max_depth=7
max_depth=9 learning_rate=0.16
min_child_weight=3 l2_leaf_reg=3
Subsample=0.8 Iterations=500
learning_rate=0.54 Subsample=0.6
GBDT n_estimators=50 RF n_estimators=50
max_depth=5 max_depth=9
Subsample=1 min_samples_split=1
learning_rate=0.05 min_samples_leaf=2
), ArticleFig(id=1208459460180291948, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=EN, label=null, caption=null, figureFileSmall=null, figureFileBig=null, tableContent=
模型 准确率/% 精确率/% 召回率/% F1-score/% AUC
XGBoost 72.32 48.63 72.55 58.23 0.814 1
CatBoost 72.46 48.73 68.63 56.99 0.796 0
GBDT 73.46 50.06 75.04 60.06 0.821 5
RF 72.46 48.77 70.94 57.81 0.815 6
), ArticleFig(id=1208459460343869814, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=CN, label=表5, caption=

各模型的性能指标

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 准确率/% 精确率/% 召回率/% F1-score/% AUC
XGBoost 72.32 48.63 72.55 58.23 0.814 1
CatBoost 72.46 48.73 68.63 56.99 0.796 0
GBDT 73.46 50.06 75.04 60.06 0.821 5
RF 72.46 48.77 70.94 57.81 0.815 6
), ArticleFig(id=1208459460448727423, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=EN, label=null, caption=null, figureFileSmall=null, figureFileBig=null, tableContent=
基学习器 权重
XGBoost 0.363 7
GBDT 0.501 9
RF 0.134 4
), ArticleFig(id=1208459463565095307, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=CN, label=表6, caption=

基学习器的权重

, figureFileSmall=null, figureFileBig=null, tableContent=
基学习器 权重
XGBoost 0.363 7
GBDT 0.501 9
RF 0.134 4
), ArticleFig(id=1208459463699313047, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=EN, label=null, caption=null, figureFileSmall=null, figureFileBig=null, tableContent=
模型 准确率/
%
精确率/
%
召回率/
%
F1-score/
%
AUC
传统Stacking
模型
73.79 50.48 74.87 60.30 0.825 6
BO_Stacking
模型
73.36 49.94 78.43 61.03 0.834 2
), ArticleFig(id=1208459463779004836, tenantId=1146029695717560320, journalId=1146123222451335185, articleId=1208054528700683097, language=CN, label=表7, caption=

各Stacking模型的评价指标

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 准确率/
%
精确率/
%
召回率/
%
F1-score/
%
AUC
传统Stacking
模型
73.79 50.48 74.87 60.30 0.825 6
BO_Stacking
模型
73.36 49.94 78.43 61.03 0.834 2
)], attaches=null, journal=Journal(id=1146119232468656133, delFlag=0, nameCn=科技和产业, nameEn=Science Technology and Industry, nameHistory1=null, nameHistory2=null, issn=1671-1807, eissn=, cn=11-4671/T, coden=null, periodic=3, language=CN, oaType=是, ccby=null, superviseOffice=null, ownerOffice=null, pubOffice=null, editorOffice=null, officeType=null, aims=null, clcCode=null, officeProv=null, officeCity=null, officeAddr=null, officeZip=null, officeEmail=null, officePhone=null, editDirector=null, officeDirector=null, officeDirectorPhone=null, officeStaffNum=null, officeEmpNum=null, coverPicUrl=65omA9nasEwsJtKyya4q2A==, journalPrice=null, startedYear=null, abbrevIsoEn=Sci Technol Ind, journalRemark=null, publicationField=null, createdTime=null, updatedTime=1755589264235, createdBy=null, updatedBy=15831073675, firstLetterCn=S, firstLetterEn=S, subjectCode=Engineering, subjectName=工程, subjectCodeEn=Engineering, subjectNameEn=null, picCn=65omA9nasEwsJtKyya4q2A==, picEn=/wwFtLNQXxKpecNSjAf1AQ==, jcr=null, cjcr=null, exts=[JournalExt(id=1164589040427811221, language=CN, name=科技和产业, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=null, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=http://www.kjhcy.org/kjycy/home, createdTime=1755589264256, updatedTime=1755589264256, createdBy=15831073675, updatedBy=15831073675, submissionGuidelinesUrl=http://www.kjhcy.org/kjycy/site/menu/20110121105642001, submissionAuthorUrl=http://www.kjhcy.org/kjycy/author/login, submissionEditorUrl=http://www.kjhcy.org/kjycy/editor/login, submissionReviewUrl=http://www.kjhcy.org/kjycy/user/login, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""}), JournalExt(id=1164589040473948566, language=EN, name=Science Technology and Industry, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=null, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=http://www.kjhcy.org/kjycy/home, createdTime=1755589264267, updatedTime=1755589264267, createdBy=15831073675, updatedBy=15831073675, submissionGuidelinesUrl=, submissionAuthorUrl=http://www.kjhcy.org/kjycy/author/login, submissionEditorUrl=http://www.kjhcy.org/kjycy/editor/login, submissionReviewUrl=http://www.kjhcy.org/kjycy/user/login, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""})], databaseList=null, tenantJournalId=1146123222451335185, websiteList=[Website(id=1148243202395595182, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1146123222451335185, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/kjhcy/CN, language=CN, createTime=1751692112778, createBy=18614031015, updateTime=1753522130587, updateBy=18614031015, name=科技和产业-中文站点, tplId=1146099689490845704, title=科技和产业, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1155919343281103033, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202395595182, code=articleTextType, value=kx, createTime=1753522247381, updateTime=1753522247381, creator=18614031015, updator=18614031015), WebsiteProps(id=1155919343260131510, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202395595182, code=banner, value=null, createTime=1753522247376, updateTime=1753522247376, creator=18614031015, updator=18614031015), WebsiteProps(id=1155919343251742901, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202395595182, code=logo, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic?fileId=NeDGaoVSnAwTO5bjjrifjg==, createTime=1753522247374, updateTime=1753522247374, creator=18614031015, updator=18614031015), WebsiteProps(id=1155919343272714424, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202395595182, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic, createTime=1753522247379, updateTime=1753522247379, creator=18614031015, updator=18614031015), WebsiteProps(id=1155919343268520119, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202395595182, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1753522247378, updateTime=1753522247378, creator=18614031015, updator=18614031015)]), Website(id=1155919131955314719, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1146123222451335185, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/kjhcy/EN, language=EN, createTime=1753522197002, createBy=18614031015, updateTime=1753522207756, updateBy=18614031015, name=科技和产业-英文站点, tplId=1146101810881728533, title=Science Technology and Industry, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1155919498877194448, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155919131955314719, code=articleTextType, value=kx, createTime=1753522284478, updateTime=1753522284478, creator=18614031015, updator=18614031015), WebsiteProps(id=1155919498860417229, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155919131955314719, code=banner, value=null, createTime=1753522284474, updateTime=1753522284474, creator=18614031015, updator=18614031015), WebsiteProps(id=1155919498852028620, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155919131955314719, code=logo, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic?fileId=NeDGaoVSnAwTO5bjjrifjg==, createTime=1753522284472, updateTime=1753522284472, creator=18614031015, updator=18614031015), WebsiteProps(id=1155919498873000143, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155919131955314719, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic, createTime=1753522284477, updateTime=1753522284477, creator=18614031015, updator=18614031015), WebsiteProps(id=1155919498864611534, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155919131955314719, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1753522284476, updateTime=1753522284476, creator=18614031015, updator=18614031015)])], journalTitle=科技和产业, weixinUrl=null, journalUrl=null, iacademicId=null, status=0, seqNo=null, journalTitleEn=Science Technology and Industry, journalPhotoCn=65omA9nasEwsJtKyya4q2A==, journalPhotoEn=/wwFtLNQXxKpecNSjAf1AQ==, journalFirstLetter=S, journalRecommend=null, journalNew=null, journalCollection=null, jcrJf=null, cjcrJf=null, jcrJfStr=null, cjcrJfStr=null, submissionFirstDecision=null, sciSubjectClassification=null, casSubjectClassification=null, citeScore=null, totalCitationFrequency=null, icpCode=null, psCode=null, advertisingLicenseCode=null, copyrightInformation=null, country=null, option=, provinceCode=null, provinceName=null, collectFlag=false), detailUrlCn=https://castjournals.cast.org.cn/joweb/kjhcy/CN/Y2025/V25/I13/241, detailUrlEn=https://castjournals.cast.org.cn/joweb/kjhcy/EN/Y2025/V25/I13/241, pdfUrlCn=https://castjournals.cast.org.cn/joweb/kjhcy/CN/PDF/Y2025/V25/I13/241, pdfUrlEn=https://castjournals.cast.org.cn/joweb/kjhcy/EN/PDF/Y2025/V25/I13/241, aliStartDate=null, aliEndDate=null, collectionFlag=false, citedCount=null, citedUrl=null, reference=null)
收藏切换
基于BO-Stacking集成学习的客户流失预测
收藏切换
PDF下载
耿宇
科技和产业 | 企业应用 2025,25(13): 241-245
收起
收藏切换
科技和产业 | 企业应用 2025, 25(13): 241-245
基于BO-Stacking集成学习的客户流失预测
全屏
耿宇
作者信息
  • 安徽建筑大学数理学院, 合肥 230601
  • 耿宇(2000—),女,安徽滁州人,硕士研究生,研究方向为数据分析中的统计方法及应用。

Customer Churn Prediction Based on BO-Stacking Ensemble Learning
Yu GENG
Affiliations
  • School of Mathematics and Physics, Anhui Jianzhu University, Hefei 230601, China
出版时间: 2025-07-10
文章导航
收藏切换

为了提高客户流失预测的准确性,提出一种基于贝叶斯优化算法(BO)的改进Stacking集成学习方法。首先,依据模型的预测性能和相关性确定基学习器的种类;然后,针对传统的Stacking方法中忽略基学习器间差异性的缺陷,引入贝叶斯优化算法来精细地调整各基学习器的权重,以降低预测误差;最后,将各基学习器的预测结果进行加权组合,并选用Logistic回归作为元学习器进行最终预测。结果显示,相较于单一模型和传统的Stacking方法,所提出的BO-Stacking模型在召回率、F1-score和AUC(敏感度曲线下方的面积)上均表现最佳,验证了所提方法的有效性,可为企业制定有效的客户保留策略提供参考。

贝叶斯优化算法(BO)  /  Stacking算法  /  集成学习  /  客户流失预测

To enhance the accuracy of customer churn prediction, an improved Stacking ensemble learning method with Bayesian optimization(BO) incorporated was introduced. First, base learners were selected based on their predictive performance and inter-model correlations. Noticing the fact that the performance variation among base learners was neglected in the traditional Stacking methods, the Bayesian optimization was introduced to fine-tune the weights of each base learner for minimizing prediction errors. Finally, the weighted predictions from the base learners were combined, and the Logistic Regression serves as the meta-learner for the final prediction. The results demonstrate that the proposed BO-Stacking model outperforms both the single models and the traditional Stacking methods in terms of recall rate, F1-score, and AUC(area under the curve) value, which validates the effectiveness of the proposed approach. This provides a reliable reference for enterprises to develop effective customer retention strategies.

Bayesian optimization(BO)  /  Stacking algorithm  /  ensemble learning  /  customer churn prediction
耿宇. 基于BO-Stacking集成学习的客户流失预测. 科技和产业, 2025 , 25 (13) : 241 -245 .
Yu GENG. Customer Churn Prediction Based on BO-Stacking Ensemble Learning[J]. Science Technology and Industry, 2025 , 25 (13) : 241 -245 .
在网络信息迅速发展的时代,世界经济正快速向全球化和市场化迈进,各行各业的竞争日益激烈,电信行业也不例外,正面临着较高的客户流失率所带来的严峻挑战,解决客户流失问题已成为当务之急。因此,利用客户的历史交易数据进行流失预测,对电信行业的持续发展至关重要。
近年来,国内外学者对客户流失的预测进行了广泛研究。周艳聪和郝园媛[1]使用BP神经网络构建客户流失预警模型,发现通过参数调优和结构调优后的模型的预测准确率更高。Zhang等[2]以中国三大电信运营商为样本,使用Fisher判别方程与Logistic回归建立流失预测模型,结果表明Logistic回归模型具有较高的预测精度。然而,这些研究主要依赖单一机器学习模型进行预测,可能因数据的随机性和复杂性而导致泛化能力不足。集成学习是一种基于机器学习的方法,它通过组合多个基本算法显著提高了预测精度和泛化能力。Swetha和Dayananda[3]提出了一种改进的XGBoost(极限梯度提升算法)模型用于预测流失客户。Senthan等[4]也证实了XGBoost模型在处理复杂数据集时的优越表现。杨洪岩[5]结合RF(随机森林算法)和XGBoost等算法,通过Voting融合方法提高了预测的准确性。王变霞[6]将RF和CatBoost(类别型特征梯度提升算法)等算法作为基学习器,将Logistic回归作为元学习器,构建Stacking模型,实现了对银行客户流失的预测。刘梅等[7]提出了一种基于主成分分析(principal component analysis, PCA)和改进灰狼优化算法(grey wolf optimization, GWO)的集成特征选择方法,并结合模型堆叠构建客户流失预测模型,结果显示该方法显著提升了预测性能。
尽管现有的集成学习模型在客户流失预测方面已取得显著成就,但它们往往未能充分考虑基学习器间的差异性对最终预测结果的影响。为此,本文提出一种改进的Stacking集成学习预测方法,旨在通过优化策略进一步提升客户流失的预测精度。具体而言,该方法通过相关性分析选择差异较大且预测效果较好的模型作为基学习器,并引入贝叶斯优化算法,实现基学习器权重的动态优化,以提高预测效果和泛化能力,从而更好地满足电信行业对客户流失精准预测的需求。
贝叶斯优化算法(Bayesian optimization, BO)[8]是一种基于序列模型的优化算法,通过构建目标函数的概率模型,可以快速搜索出给定参数空间内的最优超参数配置。贝叶斯优化框架主要由两个核心部分组成:概率代理模型(probabilistic surrogate model)和采集函数(acquisition function),前者用来拟合目标函数,后者根据已知数据递推估算最优值。
最常见的概率代理模型有高斯过程(Gaussian process, GP)、随机森林等,本文所使用的代理模型为高斯过程,表达式为
f(x)~GP[m(x),k(x,x')]
式中:m(x)为f(x)的均值函数E[f(x)];k(x,x')为x的协方差函数。高斯过程可以看作是对函数的一种先验分布。通过已知的数据点对其进行推断,可以得到一个对函数的后验分布的估计。
最常见的采集函数有概率提升函数PI、期望提升函数EI和置信上界函数UCB,本文选用的是期望提升函数EI,表达式为
EI(x)= [ μ ( x ) - f ( x + ) ] Φ ( z ) + σ ( x ) Φ ( z ) , x < 0 0 , x 0
式中:f(x+)为当前的最大值;Φ(z)为正态累计分布函数;μ(x)和σ(x)分别为代理模型在点x处预测的均值和标准差。
综上所述,贝叶斯优化在超参数优化中的基本流程如下。
步骤1:在给定的超参数空间内,随机选取一组初始的超参数向量Xinit
Xinit=[x0,x1,…,xn]
步骤2:将超参数向量代入模型中,计算并获取该参数组合下的标签与初始数据集D0
D0={Xinit,f(Xinit)}
步骤3:基于代理模型g(x),最大化采集函数,从而确定下一个需要评估的超参数点xt
xt=argmaxα(x|Dt-1)
式中:α(·)为采集函数。
步骤4:获取评估点xt的函数值f(xt),并将其加入现有的评估点集合中。
Dt=Dt-1∪{xt,f(xt)}
步骤5:设定一个最大迭代次数,若当前迭代次数达到了预设的最大值,则停止算法迭代,并从所有评估过的超参数组合中选取最优的一组:{x*,f(x*)}。若未达到最大迭代次数,则返回步骤4继续进行下一轮迭代。
Stacking集成学习算法[9]是一种异质集成学习算法,其核心思想在于利用多种不同类型的基学习器对同一数据集进行训练,然后将各基学习器的预测结果按列组合成一个新的数据集,该数据集随后作为元学习器的输入,用于产生最终的预测结果。在Stacking方法中,基学习器的选择将直接影响到整体的预测效果。因此,各基学习器在有较高的预测精度的同时还应具有差异性[10]。基于上述考虑,选择梯度提升决策树(GBDT)、极限梯度提升算法(XGBoost)、类别型特征梯度提升算法(CatBoost)和随机森林算法(RF)作为基学习器的预选模型。各算法的优缺点见表1
为了选出最优的基学习器组合,首先分析不同模型的预测能力,并通过计算各模型预测结果之间的Pearson相关系数来评估它们的相关性,以此量化模型间的差异程度。
传统Stacking集成算法将第1层基学习器的输出直接输入到第2层元学习器中,且每个基学习器被赋予相同的权重。然而,这种做法忽略了基学习器之间的差异性,可能影响模型整体的预测性能。为了解决这一问题,提出一种改进的Stacking方法,即BO-Stacking,该方法根据每个基学习器的性能动态地赋予不同的权值,性能越优的基学习器获得越大的权值。这样做不仅提高了训练元学习器数据的可信度,还增强了模型的分类性能。
为了更好地比较传统Stacking模型与BO-Stacking模型的预测效果,两种模型均采用相同的结构。构建BO-Stacking模型的主要步骤如下。
步骤1:将数据集按7∶3划分为训练集和测试集,在第一层的基学习器中,对选择的3个模型进行五折交叉验证训练,得到基学习器的结果。
步骤2:结合各个基学习器的权重,构建元学习器的加权数据集。具体计算方式如下:
Mi=[wiAi1,wiAi2,…,wiAij]T
Ni=[wiBi1,wiBi2,…,wiBij]T
式中:Mi为第i个基学习器在训练集上加权后的预测结果;Ai1,Ai2,…,Aij为第i个基学习器的第j折交叉验证的训练集输出;wi为第i个基学习器对应的权重;Ni为第i个基学习器在测试集上加权后的预测结果;Bi1,Bi2,…,Bij为第i个基学习器的第j折交叉验证的测试集输出。完成五折交叉验证后,获得经加权处理的基学习器的测试数据集MN
步骤3:在第2层元学习器中,使用Logistic模型作为预测模型,将第1层加权的预测结果作为第二层的数据集进行训练,得到最终的预测结果。
采用贝叶斯优化算法计算最优权重,将目标函数设定为1-AUC(敏感度曲线下方的面积),并设置500次迭代,同时采用十折交叉验证的方法。通过不断迭代优化,每个模型的权重均得以动态调整与更新,最终得到使模型在测试集上目标函数最小化,即AUC值最大化的权重系数。
本文主要研究关于客户流失的二分类问题,对于这类问题,一般使用混淆矩阵作为评价准则,其基本形式见表2表2中,TP和TN分别表示正确识别的流失客户数和正常客户数,FP和FN分别表示错误识别的正常客户数和流失客户数。本文主要采用准确率、精确率、召回率和F1-score作为评价指标,其计算公式如下:
准确率(accuracy)= T P + T N T P + F P + F N + T N
精确率(precision)= T P T P + F P
召回率(recall)= T P T P + F N
F1-score= 2 × p r e c i s i o n × r e c a l l p r e c i s i o n + r e c a l l
精确率(precision)和召回率(recall)呈反向变动关系,因此合理地平衡两者可以实现模型性能的优化。指标F1-score综合考量了精确率和召回率,克服了两者反向变动的缺陷。F1-score的取值为0~1,值越大,表示模型的分类效果越佳。
ROC曲线又称敏感度曲线,该曲线通过将假阳性率(false positive rate, FPR)和真阳性率(true positive rate, TPR)作为横纵坐标来描绘分类器在不同阈值下的性能,计算方式如下:
TPR= T P T P + F N, FPR= F P F P + T N
AUC表示ROC曲线下方的面积,通常其范围为0.5~1。AUC越大,表明模型的预测性能越佳。
使用IBM公司提供的Telecom企业的开源数据集,该数据集包含7 043个样本,共有3个连续型变量和18个离散型变量,共有1 869位流失客户和5 174位未流失客户,涵盖了客户个人信息、电信服务使用情况以及其他重要因素。
数据预处理主要分为以下3步。
步骤1:缺失值处理。本文所使用的数据集中,TotalCharges(总花费)变量中有11个缺失值,因其对应的客户留存期为0,表明服务尚未开通,故直接删除这些样本。
步骤2:连续型变量处理。对连续型变量进行标准化处理,以消除量纲的影响和数值差异所带来的误差。
步骤3:离散型变量处理。将二分类变量进行字符串编码,将“Yes”转为1、“No”转为0;对多类别的无序变量,如InternetService、OnlineSecurity等,应用One-hot独热编码。
最终将经过缺失值处理、标准化和One-hot独热编码后的原始数据集按照7∶3的比例划分为训练集和测试集,其中训练集有4 922条数据,测试集有2 110条数据。
使用Lasso方法筛选出对客户流失状态有显著影响的变量。Lasso方法实质是构造惩罚项来压缩模型的回归系数,使对模型贡献较小的系数被逐渐压缩到0,从而筛选出对因变量有重要影响的协变量,最终达到精简模型的效果。假设样本数据为(xi,yi),i=1,2,…,n,Lasso方法的定义如下:
β ^=argmin{ i = 1 n[ ( y i - β 0 - j = 1 p β j x i j ) 2 j = 1 p β j]}
式中:yi为因变量;xi= ( x i 1 , x i 2 , , x i p ) T,xij为第i个样本的第j个协变量; β ^为需要计算的参数;β0为模型的截距项;参数λ被用来控制Lasso的复杂程度;λ j = 1 p β j为正则化项,也称为惩罚项。
图1展示了通过10倍交叉验证来确定λ最优取值的过程,纵轴表示二项式偏差,横轴表示λ的对数,左侧虚线表示在模型误差最小时取到的调整参数λ的对数值lnλ,右侧虚线表示模型误差最小值的一倍标准误对应的lnλ。本着用较少变量得到较高准确率的原则,选取右侧虚线对应的lnλ值,此时λ=0.011 3,筛选出的变量个数为14,对应的系数见表3
对数据建立基于集成学习算法的单一预测模型,利用贝叶斯优化算法(BO)对模型进行超参数调优,各模型的超参数取值见表4。模型在测试集上的预测性能见表5
表5中可以看出,GBDT模型表现最佳,其F1-score和AUC分别为60.06%和0.821 5,其余模型均具有较好的预测性能,可作为备选的基学习器。
通过相关系数计算进一步筛选基学习器,获得各模型预测结果之间的Pearson相关系数,如图2所示。从图2中可以看出,CatBoost与XGBoost、RF模型间的相关系数均超过了0.7。鉴于Stacking集成方法的效果很大程度上依赖于基学习器的多样性,当基学习器间高度相关时,集成的效果可能会受到限制,因此在选择基学习器时去除一个高度相关的模型有助于提高整体的集成性能。考虑到XGBoost、RF和CatBoost的AUC分别为0.814 1、0.815 6和0.796 0,综合评估相关性和预测性能后,选择移除与其他模型相关性较高且AUC较低的CatBoost模型。最终,选定了XGBoost、GBDT以及RF作为Stacking框架下的基学习器,以期达到更好的综合预测效果。
在选择XGBoost、GBDT以及RF模型的基础上,构建基于贝叶斯优化算法改进的Stacking模型,即BO-Stacking模型,各基学习器的权重见表6
为了验证本文提出的BO-Stacking模型预测性能的优越性,将该模型分别与单一模型及传统Stacking模型进行比较。由表5表7可以看出,相比于单一模型,两种融合模型的各项指标均有不同程度的提升。与传统Stacking模型相比,尽管BO-Stacking模型在准确率和精确率上稍有下降,但在召回率、F1-score和AUC值这3个关键指标上都有显著提升。特别是在客户流失预测中,高召回率意味着能够更早、更准确地发现有流失风险的客户,从而帮助企业实施有效的挽留策略。因此,从整体上看,BO-Stacking模型的预测效果更好,不仅提高了对潜在流失客户的识别能力,还确保了整体预测的准确性和稳定性,为解决客户流失问题提供了更为有效的方法。
基于IBM公司提供的电信企业客户流失数据集,针对如何提高客户流失的预测性能,提出了一种改进的Stacking集成学习预测模型,即BO-Stacking模型。该模型综合考虑了不同单一模型的预测性能和相关性,优选出XGBoost、GBDT和RF算法作为基学习器。为解决传统的Stacking集成方法中因忽略基学习器间差异性而导致预测能力不足的缺陷,采用基于贝叶斯优化算法的动态权重分配策略,实现基学习器权重系数的自适应调整与全局最优配置。研究表明,BO-Stacking模型在预测效果上具有显著优势,能够为企业提供更精准的客户流失预测,并助力企业制定更为有效的客户挽留策略。
  • 安徽省高等学校科学研究重点项目(2022AH050247)
参考文献 引证文献
排序方式:
[1]
周艳聪, 郝园媛. 基于机器学习的运营商客户行为分析[J]. 科学技术与工程, 2022, 22(2): 585-592.
[2]
ZHANG T, MORO S, RAMOS R F. Adata-driven approach to improve customer churn prediction based on telecom customer segmentation[J]. Future Internet, 2022, 14(3): 1-17.
[3]
SWETHA P, DAYANANDA R B. Improvised_XGBoost machine learning algorithm for customer churn prediction[J]. EAI Endorsed Transactions on Energy Web, 2020, 7(30): e14.
[4]
SENTHAN P, RATHNAYAKA R, KUHANESWARAN B, et al. Development of churn prediction model using XGboost-telecommunication industry in sri lanka[C]// IEEE International IOT, Electronics and Mechatronics Conference. Toronto: IEEE, 2021: 1-7.
[5]
杨洪岩. 数据挖掘技术在通信用户流失预警中的研究[D]. 沈阳: 辽宁大学, 2021.
[6]
王变霞. 基于Stacking模型融合的银行信用卡客户流失预测[D]. 兰州: 兰州大学, 2022.
[7]
刘梅, 郑立君, 段永良, 等. PCA+GWO集成特征选择和模型堆叠的客户流失预测[J/OL]. 计算机工程与应用, 1-16[2024-11-05]. http: //kns. cnki. net/kcms/detail/11.2127.TP. 20240823.1756.009.html.
[8]
SHAHRIARI B, SWERSKY K, WANG Z, et al. Taking the human out of the loop: a review of Bayesian optimization[J]. Proceedings of the IEEE, 2015, 104(1): 148-175.
[9]
KHOSHKROUD A, SANI H P, AAJAMI M. Stacking ensemble-based machine learning model for predicting deterioration components of steel w-section beams[J]. Buildings, 2024, 14(1): 1-21.
[10]
周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
2025年第25卷第13期
PDF下载
281
141
引用本文
BibTeX
文章信息
  • 接收时间:2024-12-28
  • 首发时间:2025-12-17
  • 出版时间:2025-07-10
补充材料
相关文章
文章信息
作者
出版历史
  • 收稿日期:2024-12-28
基金
安徽省高等学校科学研究重点项目(2022AH050247)
作者信息
    安徽建筑大学数理学院, 合肥 230601
参考文献
分享链接
https://castjournals.cast.org.cn/joweb/kjhcy/CN/1208054528700683097
分享至
全文二维码

扫描看全文

引用本文
BibTeX
本文的引用情况
2种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏