收藏切换
Multi-Classification Prediction of Web Pages with Missing Content Features Based on Graph Embedding and Ensemble Classification Algorithm
收藏切换
PDF
Tao ZHANG1, 2, Bin LIAO3, *, Jiong YU2
Science Technology and Engineering | 2025, 25(20) : 8604 - 8614
Less
收藏切换
Science Technology and Engineering | 2025, 25(20): 8604-8614
Papers·Automation and Computational Technology
Multi-Classification Prediction of Web Pages with Missing Content Features Based on Graph Embedding and Ensemble Classification Algorithm
Full
Tao ZHANG1, 2, Bin LIAO3, *, Jiong YU2
Affiliations
  • 1 College of Information Engineering, Guizhou University of Traditional Chinese Medicine, Guiyang 550025, China
  • 2 School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
  • 3 College of Big Data Statistics, Guizhou University of Finance and Economics, Guiyang 550025, China
Published: 2025-07-18 doi: 10.12404/j.issn.1671-1815.2405389
Outline
收藏切换

Explicit content features of webpages are often unavailable due to distractions such as commercials, insufficient permissions, privacy protection, or deceptive disguises. To address the challenge of classifying webpages with severe content feature deficiency, a method combining graph embedding and extreme gradient boosting(XGBoost) was proposed. This method leveraged implicit relational features in webpage hyperlink networks for multi-classification. Firstly, a hyperlink network was constructed using relationships between webpages. Then, node features were extracted using graph embedding models, and statistical structural features such as clustering coefficients and PageRank values were concatenated to form dense feature vectors. Finally, ensemble learning models, including XGBoost, were trained to classify webpages for prediction. Experiments on a real Wikipedia dataset show that the Struct2Vec*+XGBoost approach achieves excellent classification results, with accuracy, precision, recall, and F1-score metrics reaching 0.987 5, 0.965 9, 0.971 3, and 0.964 1, respectively. These results are superior to those of comparison models. The findings demonstrate the effectiveness of using implicit link-based features for webpage classification in scenarios with content feature deficiency.

missing content features  /  graph embedding  /  webpage hyperlink network  /  webpage multi-classification
Tao ZHANG, Bin LIAO, Jiong YU. Multi-Classification Prediction of Web Pages with Missing Content Features Based on Graph Embedding and Ensemble Classification Algorithm[J]. Science Technology and Engineering, 2025 , 25 (20) : 8604 -8614 . DOI: 10.12404/j.issn.1671-1815.2405389
Year 2025 volume 25 Issue 20
PDF
56
20
Cite this Article
BibTeX
Article Info
doi: 10.12404/j.issn.1671-1815.2405389
  • Receive Date:2024-07-17
  • Online Date:2026-05-13
  • Published:2025-07-18
Article Data
Affiliations
History
  • Received:2024-07-17
  • Revised:2025-04-12
Funding
Affiliations
    1 College of Information Engineering, Guizhou University of Traditional Chinese Medicine, Guiyang 550025, China
    2 School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
    3 College of Big Data Statistics, Guizhou University of Finance and Economics, Guiyang 550025, China
References
Share
https://castjournals.cast.org.cn/joweb/kxjsygc/EN/10.12404/j.issn.1671-1815.2405389
Share to
QR

Scan QR to access full text

Cite this article
BibTeX
Citations
表12种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏
  • BibTeX
  • EndNote
  • RefWorks
  • TxT