Latest ArticlesAeroengine blades are core components of engines; tiny surface defects can lead to serious safety accidents. Traditional vision detection technology is limited by motion blur, low dynamic range, background redundancy, and so forth. To address these challenges, a method of aeroengine blade defect detection based on Dynamic Vision Sensor (DVS) was proposed. Dynamic vision sensor produced data in an asynchronous event-stream format, and were therefore referred to as event camera, which exhibited the advantages of large dynamic range, high frame rate, and strong ability to capture small targets. Firstly, a defect detection platform based on DVS was built, and its imaging characteristics and advantages were explored. On this basis, the first Event-based Defect Detection Dataset of Aeroengine Blade (EDD-AB) dataset based on DVS was constructed, covering nearly 6 000 images of scratches, point marks and edge damage, with approximately 12 000 finely annotated target labels. The dataset was released as open source (link: https://github. com/NiBieZhouMei5520/EDD-AB.git). Furthermore, a multi-scale defect-detection algorithm based on asynchronous event-stream frame aggregation (AEAF-ABDD) was proposed: event streams were visualized through frame aggregation technology using a fixed time window; a Multi-Resolution Adaptive Feature Pyramid Network (MRAFPN) was developed to enhance multi-scale defect feature extraction capability; a lightweight SimAM attention mechanism was incorporated to strengthen focus on key regions; a star-convolution module (StarNet) was fused to improve the efficiency of high-dimensional nonlinear feature mapping, enabling accurate detection of multi-scale defects on complex curved workpieces. Experiments demonstrated that AEAF-ABDD achieved a mean Average Precision (mAP) of 97.7% on the EDD-AB dataset and a detection speed of 105 frames per second, substantially outperforming mainstream algorithms. An efficient solution for automated quality inspection of highly reflective curved workpieces was thereby provided, promoting the application of DVS in the field of industrial inspection.
Existing research on multi-view stereo scheme utilizes depth-estimation algorithms to achieve stereo representation by establishing a mapping relationship between the physical and digital worlds. Supervised learning-based neural networks have achieved accurate and high-fidelity 3D reconstruction results through training. However, in-the-wild visual reconstruction remains challenging due to the lack of rendered depth priors and wide-baseline characteristics of images. A novel system was proposed to obtain optimized depth for naturally collected multi-view images without prior information by applying an unsupervised learning network and semantically optimized Neural Radiation Field (NeRF) rendering. First, preliminary depth information for wild multi-view images were produced without ground truth based on unsupervised deep learning. Subsequently, in a separate NeRF module, a diffusion model was used to construct a surface semantic rendering loss, enabling a fine-grained volumetric representation. Experimental results on the benchmark dataset validated the performance of the proposed system by improving an average of 24.6% of the overall metrics, compared with other state-of-the-art schemes. A novel wild wide-baseline dataset was also applied to verify the generalization performance, and the proposed system reduced the reconstruction error by up to 40.8% compared with all methods.
In night-vision environments, image quality was significantly degraded by glare from intense light sources, impairing the performance of night-vision assisted driving systems. Existing flare-removal algorithms suffer from limited robustness, high computational complexity, and loss of light-source information. To address these challenges, a lightweight image flare-removal method, Night Flare Removal Network+ (NFR-Net+), was proposed to enhance image clarity while meeting the real-time computational demands of mobile devices. The approach first incorporated a feature-filtering mechanism combined with residual connection strategies to strengthen feature extraction capabilities, effectively mitigating overfitting and ensuring robust flare removal across diverse lighting conditions and flare types. Additionally, a nonlinear, activation-free feature attention module was introduced. Via a lightweight design, an efficient attention mechanism was constructed that significantly improved image-detail reconstruction while reducing model parameters by approximately 8.28% and runtime memory by about 11.1%, thereby optimizing computational efficiency. To tackle the issue of diminished image naturalness due to excessive light-source removal in traditional methods, an enhanced light-source extraction module was developed within the segmentation network. This module employed an improved light-source separation strategy to accurately preserve brightness and texture details in light-source regions, ensuring the authenticity and naturalness of output images. Experimental results demonstrated that NFR-Net+ surpassed state-of-the-art methods on image quality metrics such as Structural Similarity Index Measure (SSIM), Peak Signal-to-Noise Ratio (PSNR), and Learned Perceptual Image Patch Similarity (LPIPS), exhibiting superior flare-removal performance and detail preservation. The method also demonstrated strong adaptability across various night-vision scenarios and hardware devices, fulfilling the efficiency requirements for real-time processing. Ablation studies further validated the effectiveness of individual components, highlighting the critical role of feature filtering and attention mechanisms in balancing performance and resource consumption. This approach provided an efficient, lightweight solution for applications such as nighttime autonomous driving and intelligent surveillance.
As large models continue to grow in scale, the memory footprint and computational overhead of model inference have become critical challenges. Mixed-precision quantization is an effective approach to reduce resource consumption, but existing methods suffer from insufficient outlier handling, significant quantization accuracy loss, and inefficient memory access. To address these issues, a memory-aligned mixed-precision quantization method for large models was proposed. First, weights were divided into SIMD-aligned groups, and outlier groups were identified via group-wise significance analysis, with high-significance groups quantized to 8 bit and others to 2 bit. A block-wise compensation strategy was introduced to mitigate accuracy degradation caused by 2 bit quantization. Furthermore, an efficient packing and storage scheme was designed for mixed-precision weights, where a bitmap was used to record the bit width of each data block, enabling random access. Experimental results demonstrated that the proposed method significantly reduced memory usage and improved computational efficiency while maintaining model accuracy. Specifically, on Llama2-7 B/13 B/70 B, the approach achieved perplexity reductions of 8.13/2.84/1.37 on WikiText-2 and 5.80 on C4 relative to state-of-the-art baselines. The quantized 70 B model reduced weight storage by approximately 87% compared with BF16. Across seven QA benchmarks, an average accuracy gain of 6.24% was achieved. Last, these results indicated that a mixed-precision quantization method for large language models via memory alignment could simultaneously improve compression ratio, memory-access efficiency, and overall model performance.
To address the challenges of morphological similarity and the resulting difficulty in classifying vehicle damage images uploaded by clients for auto insurance claims, a model named ResAWDNet was proposed for vehicle damage classification. Firstly, to effectively augment the model’s capacity for extracting damage features, the traditional down sampling operation was replaced with weighted dynamic convolution. This approach dynamically adjusted the weights of convolutional kernels based on the input features, thereby enhancing the model’s adaptability to features of varying scales and orientations. As a result, it enabled more precise capture of the subtle differences in vehicle damage. Secondly, to ensure that the model could concentrate on the salient discriminative regions and feature channels within the images, a dual attention mechanism was embedded after the convolutional layers of the backbone network. This mechanism concurrently learned the important weights in both spatial and channel dimensions, significantly enhancing the model’s ability to capture crucial information. Consequently, it further enhanced the decision-making accuracy of the model in the task of vehicle damage classification. Finally, experimental validation was conducted based on a dataset of vehicle damage images sourced from real accident cases. The experimental results demonstrated that the ResAWDNet model was feasible and offered significant advantages for vehicle damage classification tasks, achieving an accuracy rate of 73.79%. Compared with baseline models, ResAWDNet achieved higher accuracy in classifying multiple types of damages, robustly validating the effectiveness of the proposed model.
In intelligent traffic monitoring systems, small target vehicle detection in complex scenes faces challenges such as low feature resolution, severe occlusion interference, computational redundancy, and insufficient bounding-box regression accuracy. To balance detection accuracy with deployment efficiency on edge devices, an improved YOLOv8 framework based on dynamic sparse attention and a lightweight dual-branch structure was proposed. The method first introduced a bidirectional routing sparse attention mechanism (ReBiAttention) that enhanced the retention of shallow features for small targets by dynamically filtering key features through a two-level routing strategy. Subsequently, GSConv and VoV-GSCSP modules were integrated to reduce computational cost while dynamically adjusting multi-scale feature weights. An improved DynamicHead was applied for multi-task adaptive optimization, and a modified ShapeIoU loss function with shape- and scale-aware weighting was employed to improve localization accuracy. Experiments on the UA-DETRAC dataset showed that, relative to baseline YOLOv8n, Precision, Recall, and mAP@0.5 increased by 8.739%, 1.685%, and 7.225%, respectively, while the parameter count decreased by 4.3%. This method provided an efficient solution for accurate detection of small-target vehicles in complex traffic scenarios.
The organizational management of construction sites is a critical aspect in engineering management; however, traditional human supervision method is constrained by many environment limitations and low efficiency. In recent years, multiple government departments have issued relevant policies advocating deep integration of artificial intelligence with the real economy to promote high-quality and efficient economic development. The accuracy, efficiency, and automation advantages of Computer Vision (CV) technology have gradually led to its widespread application in the field of construction supervision. Meanwhile, the drones, which can efficiently obtain complex and varied visual data of construction scene, demonstrate their application potential in CV-based construction supervision tasks. However, the current researches on drone-based construction scene detection are limited, and the lack of overhead-perspective construction-scene image datasets restricts further development in the field. Therefore, the DJI Mavic 3T drone was utilized to obtain construction-site images to establish an open-source overhead image dataset for construction scene UB-CSD. Several advanced object-detection algorithms were selected for comparative experiments on the UB-CSD dataset, and the reasons for performance differences were analyzed from multiple dimensions such as model workflow design, computation principle, and task characteristics. The mAPs of every algorithm’s detection result were YOLOv8 and YOLOv10 (96.1%), YOLOv9 (96.0%), YOLO11 (95.7%), DETR (95.3%), Faster-RCNN (76.3%) and RetinaNet (72.1%). The analysis results indicated that the YOLO series algorithm constituted the most optical algorithm for drone-based object detection tasks in construction scenes. By establishing a new open-source special dataset and conducting comparative experiments, the conclusion drawn provided effective data and experimental cases to support future safety production management and object-detection algorithm research in the construction industry.
To address the challenge of achieving high-quality 3D reconstruction with Neural Radiation Fields (NeRF) under the influence of occluding objects, a method based on the collaborative optimization of Structure-from-Motion (SfM) and the Segment Anything Model (SAM) was propose. Building upon the Scale-Invariant Feature Transform (SIFT) algorithm within the SfM reconstruction process, geometric inconsistencies in dynamic scenes were leveraged for feature point identification and matching. Unmatched feature points were treated as dynamic occluders, guiding the SAM model—capable of point-guided segmentation—to perform dynamic occluder segmentation and generate a static scene mask. Based on the segmentation results, mask-aware volumetric rendering was used to predict colors and a quadruple loss function was established: comprising reconstruction loss, structural consistency loss, adversarial loss, and self-supervised patching loss. These objectives were jointly optimized to constrain the color output in patched regions. After iterative training, consistent restoration of geometric structure and appearance in occluded areas across multiple viewpoints was achieved. The radiometric integrity was preserved while occlusions were removed. Validation on public dynamic scene datasets demonstrated that the mask-based volumetric rendering combined with joint optimization produced an average Peak Signal-to-Noise Ratio (PSNR) improvement of 5.24 dB over baseline models and mainstream occlusion removal methods, alongside a 35% reduction in Learned Perceptual Image Patch Similarity (LPIPS). This approach established a new paradigm for 3D reconstruction in complex dynamic environments.
With the rapid development of deep learning, remarkable achievements have been made in image classification and related tasks. However, the success of these models heavily relies on large amounts of high-quality labeled data. In real-world applications, labeled data is often scarce, and manual annotation is time-consuming, labor-intensive, and costly, which limits the scalability and deployment of deep learning models. In recent years, active learning has gained significant attention due to its ability to improve model performance under limited annotation budgets. The core idea of active learning is to select the most valuable data for labeling based on certain criteria such as uncertainty, diversity, or representativeness. To address the limitations of traditional active learning methods, which often rely on manually designed heuristic sampling strategies that struggle to adapt to different task scenarios and are difficult to dynamically optimize, a Smart Reinforcement Active Learning (SRAL) approach for image classification is proposed. The sample selection process is modeled as a MARKOV DECISION PRocess (MDP), leveraging reinforcement learning’s adaptive strategy optimization ability to guide the model in dynamically selecting the most valuable samples from the unlabeled data for labeling. In this framework, the state is represented by features extracted from the unlabeled samples, the action indicates whether a sample should be selected for labeling, and the reward function is defined as the change in model accuracy after incorporating the selected sample into the training set. The Actor-Critic algorithm is adopted to optimize the sampling policy, and uncertainty-based heuristic ranking is incorporated as auxiliary information to improve the learning efficiency. Experimental results demonstrate that the proposed SRAL method significantly improves classification accuracy under the same labeling budget compared to other active learning approaches on datasets such as CIFAR-10, SVHN, and FASHION-MNIST. Furthermore, SRAL exhibits robust stability and strong generalization ability across these datasets. This confirms the effectiveness and advantages of SRAL in enhancing the performance of image classification models.
With the widespread application of Building Information Modeling (BIM) technology in the field of engineering design, the Revit platform-driven 3D forward design has developed into a relatively mature solution for superstructure. However, as a critical structural component, isolated foundations still face challenges such as low efficiency and poor information integrity in 3D modeling and drawing generation. Moreover, due to the closed data interfaces of mainstream structural analysis software, it is extremely challenging to directly extract reinforcement information from fundamental database files and synchronize to the properties of Revit elements. To address this limitation, a new parametric modeling algorithm for isolated foundations based on AutoCAD layer and text recognition was proposed, utilizing the Revit Application Programming Interface (API) and the Model-View-ViewModel (MVVM) design pattern. By importing foundation layout and reinforcement annotation drawings generated by structural analysis software (e.g., YJK) into Revit, the algorithm calculated the center coordinates and planar dimensions of foundations using layer-recognition methods. Then, a matching logic was established between foundations and reinforcement annotations according to their relative positions. Subsequently, text recognition was employed to extract the corresponding annotations’ numbering, height, and reinforcement information, which were then automatically written into the family properties of the model. Additionally, the extraction of model attributes and geometric face references enabled automatic annotation of foundation dimensions and reinforcement. Finally, the proposed method was applied to the parametric modeling and drafting of isolated foundations for a self- built cold storage factory and was compared against conventional methods. The results demonstrated that this algorithm significantly improved the efficiency of three-dimensional modeling and annotation drawing for isolated foundations, while exhibiting excellent compatibility with calculation files exported from PKPM software.