Journal of Graphics

Research on dynamic voxelization-based collision detection in construction scenarios

Hao LIN, Zhiming WU, Jilan JIN

Journal of Graphics. 2026, 47(1): 204-215.

Among all safety accidents in construction scenarios, collision accidents are regarded as one of the most common types of injury. To effectively prevent and monitor the occurrence of collision accidents, the computer graphics analysis technology has been used to assist collision detection and analysis; however, limitations remain in balancing the real-time performance with high precision of detection. To address this, a collision-detection method based on dynamic voxelization was proposed. This method integrated the generation of dynamic spatial voxel tree with the dynamic spherical voxelization calculation of resources to construct a collision detection and analysis mechanism. The core ideas are as follows: ① Based on the crowding-degree threshold, the space was recursively divided to generate a dynamic voxel tree, effectively filtering out non-collision risk areas. ② The side length of voxel units were dynamically calculated according to the relative distance between resources and resource volume, realizing the adaptive adjustment of voxel granularity. ③ Spherical voxels were used instead of traditional cubic voxels to avoid the computational burden of non-axis-aligned detection. ④ A hollowing-out procedure was introduced to eliminate internal invalid voxels, further optimizing detection efficiency. This method can accurately capture resource interactions in complex dynamic construction environments, significantly improving detection accuracy and optimizing computational efficiency. Experimental results showed that compared with traditional methods, the proposed method significantly improved the detection accuracy, with precision and accuracy reaching 94.64% and 96.67%, respectively. In terms of collision detection time, it was more efficient than most existing methods, with a calculation speed increase of at least about 11.36%. At the same time, the study analyzed the impact of key parameters such as voxel-tree depth, root-node size, and voxel side length on performance, and analyzed the consumption of CPU resources and memory resources by the method in scenarios of different scales. The consumption was within an acceptable range, verifying the applicability of the method in construction scenarios. The method provided an effective new idea of information processing for enhancing the intelligent level of construction safety management.

An image matching method for large viewpoint variation scenarios

Mengli XIANG, Zhiyong HUANG, Yali SHE, Tuojun DING

Journal of Graphics. 2026, 47(1): 90-98.

To address the significant decline in matching accuracy and the number of correspondences exhibited by existing image-matching methods under large viewpoint variations, an improved image-matching approach based on E-LoFTR was proposed. Firstly, based on a strategy of viewpoint rectification followed by fine-grained matching, a novel two-stage SIFT-based viewpoint-rectification module was proposed, which leveraged the viewpoint invariance of the Scale-Invariant Feature Transform (SIFT) algorithm and the geometric alignment capability of homography to enhance matching accuracy under large viewpoint variations. Then, a directional-gated attention mechanism was designed that employed a cascaded structure of multi-directional convolutions and dynamic gating to extract queries (Q), keys (K), and values (V). The injected geometric priors significantly enhanced the model’s robustness. Lastly, to mitigate information loss during the upsampling of fused features, the Fusion-DySample module was incorporated to further improve performance. Experimental results on the public MegaDepth dataset showed that our method achieved relative pose estimation AUCs of 57.1%, 72.7%, and 83.9% under rotation error thresholds of 5°, 10°, and 20°, respectively, outperforming E-LoFTR by 0.7%, 0.5%, and 0.4%. On the newly constructed NewMega dataset based on MegaDepth and on a private industrial dataset, our method also demonstrated substantial improvements in both the number of matches and matching accuracy.

How do robots attract children? The role of appearance, motion, and voice as multisensory features in early-stage interactions

Yi LI, Chengcai CAO, Zhangtong SONG, Zuoqi LI, Xiao LI, Hesen LI

Journal of Graphics. 2026, 47(1): 223-233.

With the rapid development of artificial intelligence technology, multimodal robots are playing an increasingly important role in preschool children’s education, entertainment, and daily life. Existing studies have primarily focused on the effects of single sensory cues of robots on children’s perception, while systematic research on multisensory integration effects remains limited. To explore how robots’ multimodal features jointly influence children’s emotional preferences and visual attention, 318 children aged 4-6 years were recruited to participate in an eye-tracking experiment. The experiment adopted a 2 (appearance features: humanoid vs. animal-like) × 3 (voice guidance: male voice, female voice, none) × 2 (gesture guidance: present vs. absent) mixed factorial design, with robot appearance features (humanoid vs. animal-like) and behavioral features (voice and gesture guidance) as independent variables, and children’s emotional preferences and eye-tracking indicators as dependent variables, thereby systematically examining the effects of multimodal features on child users. The results showed that, in terms of appearance features, no significant difference was observed in subjective preference ratings between humanoid and animal-like robots. However, humanoid robots attracted longer total fixation duration, more fixation counts, and shorter first-fixation latency, indicating superior attention-related performance compared with animal-like robots. Children were more readily attracted to humanoid robots during the initial stage of visual contact, and anthropomorphic design showed greater advantages in sustaining children’s attention. In terms of behavioral features, robots with gesture guidance received significantly higher subjective preference ratings than those without gestures, and also elicited longer total fixation duration and more fixation counts. Robots with female voices received slightly higher subjective preference ratings than those with male voices, and both were significantly preferred over robots without voices. Robots with male voices had slightly longer total fixation duration than those with female voices, and both significantly outperformed robots without voices. The difference in fixation counts between male- and female-voice robots was not significant, but both attracted significantly more fixations than robots without voices. Robots with gesture guidance and voice (especially female voice) performed better in subjective ratings and visual attention allocation, suggesting that behavioral features substantially enhanced children’s emotional preferences and interactive experiences. Furthermore, the effects of appearance and behavioral features on children’s emotional preferences and visual attention were relatively independent, and no significant interaction effects were observed. This study revealed the mechanisms through which robot appearance and behavioral features influenced preschool children’s emotional preferences and visual attention, thereby providing scientific evidence for designing child-oriented robots that align with users’ emotional needs.

A dynamic pruning approach for cross-domain few-shot image generation

Shiliang LI, Qiang FANG, Yihua WANG, Yifei SHI, Zhuo WANG, Zeyu LI, Yunfei XIE, Jia WANG

Journal of Graphics. 2026, 47(1): 131-142.

Few-shot image generation has important application value in fields such as medical imaging and artistic creation. In recent years, significant research progress has been made in this task, with mainstream approaches typically relying on transferring generative models pretrained on large-scale source domain datasets to target domains to mitigate data-scarcity challenges. However, when substantial semantic gaps exist between source and target domains, direct transfer often introduced incompatible source-specific features, degrading image realism and style consistency. Although existing methods have removed redundant features via static pruning strategies, such as fixed-threshold filter pruning, they struggle to adapt to the dynamic evolution of features across different layers of deep networks, often resulting in the mistaken removal of general low-level features while retaining redundant high-level ones, thereby affecting the adaptation performance and generation quality of the model. To address this, a dynamic pruning method based on filter-importance estimation was proposed. Specifically, the method continuously tracked the changes in Fisher information of each layer’s filters during training to evaluate their importance for image generation quality. Based on the Fisher information, a cumulative importance weight-based adaptive pruning mechanism was constructed to dynamically determine the pruning ratio for each layer, enabling more precise removal of redundant or incompatible filters while preserving general structural semantic information. Experiments were conducted on several representative few-shot target domains, and results showed that the proposed method significantly outperformed existing approaches in terms of image quality (Frechet Inception Distance, FID) and image diversity (Intra-domain Learned Perceptual Image Patch Similarity, Intra-LPIPS). In target domains exhibiting significant semantic differences from the source domain, the proposed method achieved superior FID scores compared with the current state-of-the-art methods, demonstrating its stability and superiority for cross-domain few-shot image generation tasks.

Review of deep learning based methods for detecting focal liver lesions

Wenyi DONG, Weidong YANG, Binghui TANG, Qi WANG, Hongyu XIAO

Journal of Graphics. 2026, 47(1): 1-16.

The detection of Focal Liver Lesions (FLLs) is crucial for disease diagnosis and treatment. Traditional detection methods face many challenges, and the application of deep-learning technology brings new opportunities. In view of this, this paper systematically reviewed the deep-learning-based FLLs detection methods, and provided specific research directions for the development of FLLs detection technology by analyzing the advantages and disadvantages of related technologies. First, the public datasets of liver radiological images were organized and summarized, and the key role of data preprocessing in improving model performance was expounded. Secondly, the 2D and 3D detection algorithms based on convolutional neural networks, Transformer, knowledge distillation, and other technologies were compared and analyzed, revealing the technical evolution path from local feature modeling to global spatio-temporal correlation. In addition, the temporal feature fusion methods for multi-phase images were examined in depth, providing new ideas for dynamic lesion characterization. The review showed that existing methods had achieved breakthroughs in detection accuracy and efficiency, but still faced challenges such as insufficient sensitivity to small lesions, weak cross-device generalization, and lack of clinical verification. Future research was recommended to accelerate the clinical transformation and application of deep learning in auxiliary diagnosis of liver lesions through multi-center data collaboration, lightweight algorithm design, and enhanced interpretability.

Generative digital twin modeling based on large models

Shenglong LIANG, Qiuxia FAN

Journal of Graphics. 2026, 47(1): 173-178.

To address the challenges in integrating Digital-Twin (DT) technology with large-scale generative models in industrial design, a CAD-LDT digital-twin modeling framework based on generative foundation models was proposed. The framework adopted a triadic architecture consisting of a physical-entity module, an intelligent generation module, and a virtual-entity module, and innovatively incorporated multi-modal data fusion mechanisms and domain-knowledge constraints to enable autonomous generation of parameterized CAD models from physical-entity descriptions. Utilizing LLaVA-7B and LLaMA-7B as backbone models, the framework employed LoRA-based lightweight adapters to achieve cross-modal alignment between visual and textual features, and introduced a constraint encoder that transformed geometric tolerances and physical rules into structured JSON objects. To enhance the mathematical consistency of spatial transformations, Lie-group algorithms were adopted for the optimization of rigid-body transformations, while a geometric-weight binning strategy was proposed to discretize complex assembly relationships. Moreover, a spatiotemporal-decoupled generation strategy was designed to jointly optimize spatial layout and assembly sequencing. Experimental results on the DeepCAD dataset indicated that the proposed framework achieved an Intersection- over-Union (IoU) of 83.6%, a constraint satisfaction rate of 91.3%, and a 26.5% improvement in generation efficiency, significantly outperforming existing baseline models. Further ablation studies confirmed the critical contributions of multi-modal fusion, constraint encoding mechanisms, and Lie-group optimization to modeling performance, providing a novel DT modeling paradigm for intelligent manufacturing with demonstrated value in parametric design and assembly process optimization.

Intelligent analysis of design about roof equipment inspection paths based on graph theory and improved A* algorithm

Ruiqi HE, Ying CAO, Jinglin XU, Fangqiang YU

Journal of Graphics. 2026, 47(1): 216-222.

In roof engineering design, the rationality of equipment maintenance circulation routes directly impacts maintenance efficiency and safety. Traditional design methods often rely on empirical judgment, making it difficult to sufficiently evaluate the rationality of these routes during the design phase. To address this, a hybrid algorithm combining graph theory with an improved A* algorithm was developed. Integrated with Building Information Modeling (BIM) technology, an intelligent analysis and design tool for roof equipment maintenance circulation routes was created to address the shortcomings of traditional design via digital model-based route analysis. First, the roof was converted into a weighted equivalent grid map using collision detection and an octree algorithm. Next, an improved A* algorithm was employed to optimize the maintenance paths, comprehensively considering equipment collision volumes and spatial constraints to calculate the optimal maintenance circulation route and evaluate the rationality of detailed route-area design. Finally, the intelligent analysis and design tool based on this algorithm was tested on an actual project. Experimental results demonstrated that the algorithm accurately revealed potential spatial conflicts and irrational layouts, providing data to support design optimization, and enhanced design rationality and operability; it also improved efficiency by more than five times compared with traditional manual design. The intelligent analysis tool based on this algorithm is currently in use in several projects by the Shanghai Construction (No.4) Group Co., Ltd.

Research on assembly accuracy prediction of complex products considering rough surfaces

Gangfeng WANG, Huan ZHANG, Yingying YANG, Yitao LIU, Yanyun GUO, Ping YUE, Yanhui SUN

Journal of Graphics. 2026, 47(1): 162-172.

Given that the impact of rough surfaces on assembly accuracy had been insufficiently considered in the existing assembly accuracy prediction for complex products, leading to inaccurate precision prediction and limited practical assembly applicability, an assembly-accuracy prediction method considering rough surfaces was proposed. Firstly, an assembly-accuracy information model was constructed to express mating feature, geometric tolerance, and roughness information. Based on the model, an assembly-precision knowledge graph was constructed. Secondly, a geometric-tolerance representation model was established based on the Small-Displacement Torsor (SDT) theory; a simulation method for rough surfaces of plane and cylindrical parts as well as a determination method of SDT expressions were studied. Thirdly, the error-propagation path of the assembly was determined according to the assembly sequence, and a pose-relationship graph for the assembly was constructed. Then, the assembly-precision prediction was achieved using a Jacobian-torsor model. Finally, the feasibility of the method was verified using the crank-connecting-rod mechanism of a specific construction-machine model as an example. The simulation results demonstrated that the method could achieve accurate assembly-precision prediction and provided valuable guidance for practical assembly operations.

A point cloud classification and segmentation algorithm based on lightweight networks and weighted RF

Fuqun ZHAO, Hanzhu HAO, Jiale YU

Journal of Graphics. 2026, 47(1): 143-151.

To address the issues of high computational cost and complex network models in point cloud classification and segmentation methods, a point cloud classification and segmentation algorithm based on lightweight networks and weighted Random Forest (RF) was proposed. The algorithm achieved efficient classification and segmentation in a hierarchical manner. Firstly, to address the issues of multiple layers and complex computation in traditional neural networks, a lightweight neural network was constructed to extract point cloud features such as global shape, inter-regional relationships, curvature, normal vector, and color, thereby achieving rapid rough classification and segmentation of point clouds. Then, to address data imbalance, an adaptive classification and segmentation strategy was designed. By introducing a weighted RF and combining inconsistency-measurement screening with dynamic-weighting optimization mechanisms, fine classification and segmentation of point clouds were achieved. The algorithm conducted classification experiments on the ModelNet40 dataset and segmentation experiments on the Semantic3D dataset and outdoor-scene point-cloud data. The results showed that compared with Local Geo-Transformer, PointNeXt, and FastPointNet++, classification and segmentation accuracy increased by approximately 1.9%, 1.6%, and 1.7%, respectively, while classification and segmentation time was reduced by approximately 40%, 30%, and 20%, respectively. Thus, the proposed point-cloud classification and segmentation algorithm based on lightweight networks and weighted RF can effectively reduce the training time of the model and improve the efficiency of classification and segmentation while maintaining high accuracy, making it an effective point cloud classification and segmentation algorithm.

Deep fusion of multimodal features for few-shot class-incremental 3D point cloud classification

Chenxi ZHU, Yinan LU, Tieru WU, Wenyong GONG, Rui MA

Journal of Graphics. 2026, 47(1): 78-89.

Traditional 3D point-cloud classification methods tend to suffer from insufficient generalization and catastrophic forgetting in Few-Shot Class-incremental Learning (FSCIL) scenarios. The pretrained vision-language model CLIP (Contrastive Language-Image Pre-training), which contains rich 2D shape priors, has been shown to effectively enhance 3D FSCIL performance. However, existing CLIP-based frameworks still lack flexibility and adaptability in multimodal feature extraction and fusion, which limits classification accuracy during incremental stages. To address these shortcomings, a 3D FSCIL approach with deeply fused multimodal features was proposed. An adaptive adapter based on gated units and residual blocks was introduced to achieve multi-scale feature alignment and redundancy suppression, and a multimodal global feature dynamic fusion module with self-attention was designed to adaptively adjust the weight allocation of different feature streams according to sample characteristics, thereby obtaining more consistent and complementary fused representations. Specifically, point clouds were rendered into multi-view depth maps, and features were extracted using both the original CLIP visual encoder and a CLIP encoder pretrained on depth maps, combined with point-cloud geometric features. After processing through the adaptive adapter, these features were fed into the attention-based fusion module and aligned with semantic features extracted by the CLIP text encoder for classification. In addition, contrastive learning loss, multi-view and geometric perturbation-based data augmentation strategies, and a memory-replay mechanism were incorporated to effectively mitigate overfitting and forgetting under few-shot conditions. Experiments on ShapeNet, ModelNet, and CO3D demonstrated that the proposed method consistently achieved higher accuracy across incremental stages compared with existing 3D FSCIL approaches, while significantly reducing both relative accuracy drop rates and maximum stage fluctuations.