Most ReadAs large models continue to grow in scale, the memory footprint and computational overhead of model inference have become critical challenges. Mixed-precision quantization is an effective approach to reduce resource consumption, but existing methods suffer from insufficient outlier handling, significant quantization accuracy loss, and inefficient memory access. To address these issues, a memory-aligned mixed-precision quantization method for large models was proposed. First, weights were divided into SIMD-aligned groups, and outlier groups were identified via group-wise significance analysis, with high-significance groups quantized to 8 bit and others to 2 bit. A block-wise compensation strategy was introduced to mitigate accuracy degradation caused by 2 bit quantization. Furthermore, an efficient packing and storage scheme was designed for mixed-precision weights, where a bitmap was used to record the bit width of each data block, enabling random access. Experimental results demonstrated that the proposed method significantly reduced memory usage and improved computational efficiency while maintaining model accuracy. Specifically, on Llama2-7 B/13 B/70 B, the approach achieved perplexity reductions of 8.13/2.84/1.37 on WikiText-2 and 5.80 on C4 relative to state-of-the-art baselines. The quantized 70 B model reduced weight storage by approximately 87% compared with BF16. Across seven QA benchmarks, an average accuracy gain of 6.24% was achieved. Last, these results indicated that a mixed-precision quantization method for large language models via memory alignment could simultaneously improve compression ratio, memory-access efficiency, and overall model performance.
Traditional 3D point-cloud classification methods tend to suffer from insufficient generalization and catastrophic forgetting in Few-Shot Class-incremental Learning (FSCIL) scenarios. The pretrained vision-language model CLIP (Contrastive Language-Image Pre-training), which contains rich 2D shape priors, has been shown to effectively enhance 3D FSCIL performance. However, existing CLIP-based frameworks still lack flexibility and adaptability in multimodal feature extraction and fusion, which limits classification accuracy during incremental stages. To address these shortcomings, a 3D FSCIL approach with deeply fused multimodal features was proposed. An adaptive adapter based on gated units and residual blocks was introduced to achieve multi-scale feature alignment and redundancy suppression, and a multimodal global feature dynamic fusion module with self-attention was designed to adaptively adjust the weight allocation of different feature streams according to sample characteristics, thereby obtaining more consistent and complementary fused representations. Specifically, point clouds were rendered into multi-view depth maps, and features were extracted using both the original CLIP visual encoder and a CLIP encoder pretrained on depth maps, combined with point-cloud geometric features. After processing through the adaptive adapter, these features were fed into the attention-based fusion module and aligned with semantic features extracted by the CLIP text encoder for classification. In addition, contrastive learning loss, multi-view and geometric perturbation-based data augmentation strategies, and a memory-replay mechanism were incorporated to effectively mitigate overfitting and forgetting under few-shot conditions. Experiments on ShapeNet, ModelNet, and CO3D demonstrated that the proposed method consistently achieved higher accuracy across incremental stages compared with existing 3D FSCIL approaches, while significantly reducing both relative accuracy drop rates and maximum stage fluctuations.
Given that the impact of rough surfaces on assembly accuracy had been insufficiently considered in the existing assembly accuracy prediction for complex products, leading to inaccurate precision prediction and limited practical assembly applicability, an assembly-accuracy prediction method considering rough surfaces was proposed. Firstly, an assembly-accuracy information model was constructed to express mating feature, geometric tolerance, and roughness information. Based on the model, an assembly-precision knowledge graph was constructed. Secondly, a geometric-tolerance representation model was established based on the Small-Displacement Torsor (SDT) theory; a simulation method for rough surfaces of plane and cylindrical parts as well as a determination method of SDT expressions were studied. Thirdly, the error-propagation path of the assembly was determined according to the assembly sequence, and a pose-relationship graph for the assembly was constructed. Then, the assembly-precision prediction was achieved using a Jacobian-torsor model. Finally, the feasibility of the method was verified using the crank-connecting-rod mechanism of a specific construction-machine model as an example. The simulation results demonstrated that the method could achieve accurate assembly-precision prediction and provided valuable guidance for practical assembly operations.
With the rapid development of deep learning, remarkable achievements have been made in image classification and related tasks. However, the success of these models heavily relies on large amounts of high-quality labeled data. In real-world applications, labeled data is often scarce, and manual annotation is time-consuming, labor-intensive, and costly, which limits the scalability and deployment of deep learning models. In recent years, active learning has gained significant attention due to its ability to improve model performance under limited annotation budgets. The core idea of active learning is to select the most valuable data for labeling based on certain criteria such as uncertainty, diversity, or representativeness. To address the limitations of traditional active learning methods, which often rely on manually designed heuristic sampling strategies that struggle to adapt to different task scenarios and are difficult to dynamically optimize, a Smart Reinforcement Active Learning (SRAL) approach for image classification is proposed. The sample selection process is modeled as a MARKOV DECISION PRocess (MDP), leveraging reinforcement learning’s adaptive strategy optimization ability to guide the model in dynamically selecting the most valuable samples from the unlabeled data for labeling. In this framework, the state is represented by features extracted from the unlabeled samples, the action indicates whether a sample should be selected for labeling, and the reward function is defined as the change in model accuracy after incorporating the selected sample into the training set. The Actor-Critic algorithm is adopted to optimize the sampling policy, and uncertainty-based heuristic ranking is incorporated as auxiliary information to improve the learning efficiency. Experimental results demonstrate that the proposed SRAL method significantly improves classification accuracy under the same labeling budget compared to other active learning approaches on datasets such as CIFAR-10, SVHN, and FASHION-MNIST. Furthermore, SRAL exhibits robust stability and strong generalization ability across these datasets. This confirms the effectiveness and advantages of SRAL in enhancing the performance of image classification models.
With the rapid development of artificial intelligence technology, multimodal robots are playing an increasingly important role in preschool children’s education, entertainment, and daily life. Existing studies have primarily focused on the effects of single sensory cues of robots on children’s perception, while systematic research on multisensory integration effects remains limited. To explore how robots’ multimodal features jointly influence children’s emotional preferences and visual attention, 318 children aged 4-6 years were recruited to participate in an eye-tracking experiment. The experiment adopted a 2 (appearance features: humanoid vs. animal-like) × 3 (voice guidance: male voice, female voice, none) × 2 (gesture guidance: present vs. absent) mixed factorial design, with robot appearance features (humanoid vs. animal-like) and behavioral features (voice and gesture guidance) as independent variables, and children’s emotional preferences and eye-tracking indicators as dependent variables, thereby systematically examining the effects of multimodal features on child users. The results showed that, in terms of appearance features, no significant difference was observed in subjective preference ratings between humanoid and animal-like robots. However, humanoid robots attracted longer total fixation duration, more fixation counts, and shorter first-fixation latency, indicating superior attention-related performance compared with animal-like robots. Children were more readily attracted to humanoid robots during the initial stage of visual contact, and anthropomorphic design showed greater advantages in sustaining children’s attention. In terms of behavioral features, robots with gesture guidance received significantly higher subjective preference ratings than those without gestures, and also elicited longer total fixation duration and more fixation counts. Robots with female voices received slightly higher subjective preference ratings than those with male voices, and both were significantly preferred over robots without voices. Robots with male voices had slightly longer total fixation duration than those with female voices, and both significantly outperformed robots without voices. The difference in fixation counts between male- and female-voice robots was not significant, but both attracted significantly more fixations than robots without voices. Robots with gesture guidance and voice (especially female voice) performed better in subjective ratings and visual attention allocation, suggesting that behavioral features substantially enhanced children’s emotional preferences and interactive experiences. Furthermore, the effects of appearance and behavioral features on children’s emotional preferences and visual attention were relatively independent, and no significant interaction effects were observed. This study revealed the mechanisms through which robot appearance and behavioral features influenced preschool children’s emotional preferences and visual attention, thereby providing scientific evidence for designing child-oriented robots that align with users’ emotional needs.
To address the issues of high computational cost and complex network models in point cloud classification and segmentation methods, a point cloud classification and segmentation algorithm based on lightweight networks and weighted Random Forest (RF) was proposed. The algorithm achieved efficient classification and segmentation in a hierarchical manner. Firstly, to address the issues of multiple layers and complex computation in traditional neural networks, a lightweight neural network was constructed to extract point cloud features such as global shape, inter-regional relationships, curvature, normal vector, and color, thereby achieving rapid rough classification and segmentation of point clouds. Then, to address data imbalance, an adaptive classification and segmentation strategy was designed. By introducing a weighted RF and combining inconsistency-measurement screening with dynamic-weighting optimization mechanisms, fine classification and segmentation of point clouds were achieved. The algorithm conducted classification experiments on the ModelNet40 dataset and segmentation experiments on the Semantic3D dataset and outdoor-scene point-cloud data. The results showed that compared with Local Geo-Transformer, PointNeXt, and FastPointNet++, classification and segmentation accuracy increased by approximately 1.9%, 1.6%, and 1.7%, respectively, while classification and segmentation time was reduced by approximately 40%, 30%, and 20%, respectively. Thus, the proposed point-cloud classification and segmentation algorithm based on lightweight networks and weighted RF can effectively reduce the training time of the model and improve the efficiency of classification and segmentation while maintaining high accuracy, making it an effective point cloud classification and segmentation algorithm.
The traditional development approach for complex forming equipment typically relies on Document-Based Systems Engineering (DBSE), which often leads to issues such as protracted development cycles due to inadequate requirement analysis, incomplete requirement coverage caused by textual ambiguity, and equipment development delays lagging behind technological iterations. These shortcomings frequently result in final designs that fail to meet target performance metrics and require inefficient, repetitive modifications. Therefore, in the conceptual design stage of complex forming equipment, and drawing on the U.S. Department of Defense Architecture Framework (DoDAF) combined with Model-Based Systems Engineering (MBSE), an MBSE-based conceptual-design method for complex forming equipment was proposed. This method utilized five viewpoints, including panoramic viewpoint, capability viewpoint, operational viewpoint, systems viewpoint, and standards viewpoint, as entry points for the conceptual design of complex forming equipment. Through multi-perspective analysis, the method performed top-level requirements acquisition, requirements refinement analysis, functional analysis, and system modeling across four design levels. Eleven types of models were established using the Systems Modeling Language (SysML), enabling digital and procedural expression in the conceptual design stage of complex forming equipment. Finally, superplastic-forming equipment was used as a representative example to demonstrate the application of this design method. The application of the method addressed the shortcomings of traditional design approaches and demonstrated that the method provided effective guidance for the forward development of complex forming equipment.
Aeroengine blades are core components of engines; tiny surface defects can lead to serious safety accidents. Traditional vision detection technology is limited by motion blur, low dynamic range, background redundancy, and so forth. To address these challenges, a method of aeroengine blade defect detection based on Dynamic Vision Sensor (DVS) was proposed. Dynamic vision sensor produced data in an asynchronous event-stream format, and were therefore referred to as event camera, which exhibited the advantages of large dynamic range, high frame rate, and strong ability to capture small targets. Firstly, a defect detection platform based on DVS was built, and its imaging characteristics and advantages were explored. On this basis, the first Event-based Defect Detection Dataset of Aeroengine Blade (EDD-AB) dataset based on DVS was constructed, covering nearly 6 000 images of scratches, point marks and edge damage, with approximately 12 000 finely annotated target labels. The dataset was released as open source (link: https://github. com/NiBieZhouMei5520/EDD-AB.git). Furthermore, a multi-scale defect-detection algorithm based on asynchronous event-stream frame aggregation (AEAF-ABDD) was proposed: event streams were visualized through frame aggregation technology using a fixed time window; a Multi-Resolution Adaptive Feature Pyramid Network (MRAFPN) was developed to enhance multi-scale defect feature extraction capability; a lightweight SimAM attention mechanism was incorporated to strengthen focus on key regions; a star-convolution module (StarNet) was fused to improve the efficiency of high-dimensional nonlinear feature mapping, enabling accurate detection of multi-scale defects on complex curved workpieces. Experiments demonstrated that AEAF-ABDD achieved a mean Average Precision (mAP) of 97.7% on the EDD-AB dataset and a detection speed of 105 frames per second, substantially outperforming mainstream algorithms. An efficient solution for automated quality inspection of highly reflective curved workpieces was thereby provided, promoting the application of DVS in the field of industrial inspection.
In night-vision environments, image quality was significantly degraded by glare from intense light sources, impairing the performance of night-vision assisted driving systems. Existing flare-removal algorithms suffer from limited robustness, high computational complexity, and loss of light-source information. To address these challenges, a lightweight image flare-removal method, Night Flare Removal Network+ (NFR-Net+), was proposed to enhance image clarity while meeting the real-time computational demands of mobile devices. The approach first incorporated a feature-filtering mechanism combined with residual connection strategies to strengthen feature extraction capabilities, effectively mitigating overfitting and ensuring robust flare removal across diverse lighting conditions and flare types. Additionally, a nonlinear, activation-free feature attention module was introduced. Via a lightweight design, an efficient attention mechanism was constructed that significantly improved image-detail reconstruction while reducing model parameters by approximately 8.28% and runtime memory by about 11.1%, thereby optimizing computational efficiency. To tackle the issue of diminished image naturalness due to excessive light-source removal in traditional methods, an enhanced light-source extraction module was developed within the segmentation network. This module employed an improved light-source separation strategy to accurately preserve brightness and texture details in light-source regions, ensuring the authenticity and naturalness of output images. Experimental results demonstrated that NFR-Net+ surpassed state-of-the-art methods on image quality metrics such as Structural Similarity Index Measure (SSIM), Peak Signal-to-Noise Ratio (PSNR), and Learned Perceptual Image Patch Similarity (LPIPS), exhibiting superior flare-removal performance and detail preservation. The method also demonstrated strong adaptability across various night-vision scenarios and hardware devices, fulfilling the efficiency requirements for real-time processing. Ablation studies further validated the effectiveness of individual components, highlighting the critical role of feature filtering and attention mechanisms in balancing performance and resource consumption. This approach provided an efficient, lightweight solution for applications such as nighttime autonomous driving and intelligent surveillance.
To address the challenges in integrating Digital-Twin (DT) technology with large-scale generative models in industrial design, a CAD-LDT digital-twin modeling framework based on generative foundation models was proposed. The framework adopted a triadic architecture consisting of a physical-entity module, an intelligent generation module, and a virtual-entity module, and innovatively incorporated multi-modal data fusion mechanisms and domain-knowledge constraints to enable autonomous generation of parameterized CAD models from physical-entity descriptions. Utilizing LLaVA-7B and LLaMA-7B as backbone models, the framework employed LoRA-based lightweight adapters to achieve cross-modal alignment between visual and textual features, and introduced a constraint encoder that transformed geometric tolerances and physical rules into structured JSON objects. To enhance the mathematical consistency of spatial transformations, Lie-group algorithms were adopted for the optimization of rigid-body transformations, while a geometric-weight binning strategy was proposed to discretize complex assembly relationships. Moreover, a spatiotemporal-decoupled generation strategy was designed to jointly optimize spatial layout and assembly sequencing. Experimental results on the DeepCAD dataset indicated that the proposed framework achieved an Intersection- over-Union (IoU) of 83.6%, a constraint satisfaction rate of 91.3%, and a 26.5% improvement in generation efficiency, significantly outperforming existing baseline models. Further ablation studies confirmed the critical contributions of multi-modal fusion, constraint encoding mechanisms, and Lie-group optimization to modeling performance, providing a novel DT modeling paradigm for intelligent manufacturing with demonstrated value in parametric design and assembly process optimization.