The metaverse enables immersive virtual healthcare environments, presenting opportunities for enhanced care delivery. A key challenge lies in effectively combining multimodal healthcare data and generative artificial intelligence abilities within metaverse-based healthcare applications, which is a problem that needs to be addressed. This paper proposes a novel multimodal learning framework for metaverse healthcare, MMLMH, based on collaborative intra- and intersample representation and adaptive fusion. Our framework introduces a collaborative representation learning approach that captures shared and modality-specific features across text, audio, and visual health data. By combining modality-specific and shared encoders with carefully formulated intrasample and intersample collaboration mechanisms, MMLMH achieves superior feature representation for complex health assessments. The framework's adaptive fusion approach, utilizing attention mechanisms and gated neural networks, demonstrates robust performance across varying noise levels and data quality conditions. Experiments on metaverse healthcare datasets demonstrate MMLMH's superior performance over baseline methods across multiple evaluation metrics. Longitudinal studies and visualization further illustrate MMLMH's adaptability to evolving virtual environments and balanced performance across diagnostic accuracy, patient–system interaction efficacy, and data integration complexity. The proposed framework has a unique advantage in that a similar level of performance is maintained across various patient populations and virtual avatars, which could lead to greater personalization of healthcare experiences in the metaverse. MMLMH's successful functioning in such complicated circumstances suggests that it can combine and process information streams from several sources. They can be successfully utilized in next-generation healthcare delivery through virtual reality.
| • | Multimodal Corpus of Sentiment Intensity (CMU-MOSI) [25]: CMU-MOSI is a dataset of multimodal language focused on multimodal sentiment analysis, containing 2,199 video segments from 93 YouTube movie review videos. We applied the sentiment scoring method of the reviews to this dataset and augmented the data by the synthetic virtual environment to recreate virtual world healthcare consultations. |
| • | Medical Information Mart for Intensive Care III (MIMIC-III) [26]: The MIMIC-III dataset is an extensive open-access database that includes health-related information of over 40,000 subjects who spent time in critical care units of the Beth Israel Deaconess Medical Center from 2001 to 2012, while their identities were anonymized. |
| • | Uni2Mul [27]: a conformer-based multimodal emotion classification model adapted for health status prediction in virtual environments |
| • | CSID [28]: a multimodal image fusion algorithm modified to integrate virtual and real medical imaging data for enhanced clinical diagnosis in metaverse settings |
| • | DAMUN [29]: a domain-adaptive human activity recognition network based on multimodal feature fusion, adapted for patient behavior analysis in virtual healthcare environments |
| • | HAMF [30]: a hierarchical attention-based multimodal fusion framework utilizing imaging, genetic, and clinical data, extended to incorporate metaverse-specific features for early disease detection |
| • | MFNet [31]: a multimodal fusion network for intensive care unit patient outcome prediction, modified to handle virtual patient data streams |
| • | MD-RCNN [32]: a multimodal data-based recurrent convolutional neural network for disease risk prediction, adapted to process synthetic data from metaverse healthcare simulations |
| • | text: patient-reported symptoms and medical history |
| • | audio: voice patterns indicating stress or pain levels |
| • | visual: avatar movements and expressions suggesting physical conditions |
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |