Latest ArticlesThe objective of this research is to enhance the quality and accuracy of information extracted from coal mine images, which are often degraded by high dust concentrations and uneven lighting conditions. These challenging environmental conditions introduce noise, reduce local contrast, and lead to the loss of fine details and edge textures, ultimately compromising the visual quality and the reliability of information extraction. Aiming to address these challenges, this study proposes a self-supervised coal mine image denoising algorithm based on adaptive masking. Designed to handle a wide range of noise levels and types, this algorithm aims to restore the original integrity of the image while preserving critical visual features. The proposed algorithm is divided into three main components: adaptive masking, mask integration, and an adaptive integrated loss function. Each component plays a vital role in enhancing the denoising process, ensuring that the final output is accurate and visually appealing.
The adaptive masking component is the cornerstone of the proposed algorithm, enabling segmented processing of coal mine images. This segmentation not only reduces computational overhead but also allows for more targeted and effective denoising. By dividing each image into smaller blocks, the algorithm can analyze and process each section independently, thereby improving the overall efficiency of the denoising process. The module operates by sequentially applying a mask to the edge and corner pixels of each block, while deliberately excluding the central pixels. This method prevents the network from performing a trivial identity mapping that fails to enhance image quality. Instead, this approach introduces data variability that boosts the generalization capability and robustness of the neural network model, making it adaptable to previously unknown images. The adaptive nature of the mask ensures that the module responds dynamically to varying noise levels and image features. By analyzing local variance and texture complexity, the mask can adaptively determine the optimal masking strategy for each block. This tailored approach ensures that the denoising process is responsive to the specific characteristics of each image, substantially improving its effectiveness. Subsequently, once the masking process is complete, the mask integration module is employed. This module is responsible for fusing the neural network’s output with the masked areas to reconstruct a coherent and denoised image. The integration involves calculating the Hadamard product (element-wise multiplication) between the network’s output and the masked image. This strategic operation enhances the network’s capability to distinguish between actual image content and noise, especially around edges and texture boundaries. In this stage, considering local and global features of the coal mine images is crucial. Effective integration of these features allows the algorithm effectively interpret image context, leading in denoised outputs that are coherent and structurally complete. The mask integration module also ensures that denoised areas seamlessly blend into the rest of the image, preserving the overall visual flow and structural integrity. Furthermore, this module incorporates a quality evaluation mechanism to assess the effectiveness of the integration. The feedback from these evaluations is used to iteratively refine the integration process. The final component of the algorithm is an adaptive integrated loss function, which guides the model during training. This loss function is specifically designed to address the unique challenges of coal mine image denoising, including complex noise patterns and the need to preserve subtle image details. The adaptive integrated loss uses the integrated image as a training label, allowing the model to learn effectively from the differences between the noisy input images and the denoised outputs. Additionally, by incorporating the original noisy image, the loss function increases the model’s sensitivity to signal changes, enhancing its adaptability across various denoising scenarios and noise conditions.
The proposed algorithm was rigorously tested using an underground coal mine image dataset alongside four additional public datasets, including Kodak24 (Kodak lossless true color image suite), BSD300 (Berkeley segmentation dataset 300), and BSDS500 (Berkeley segmentation dataset 500). The experiments were specifically designed to simulate real-world conditions, with a particular emphasis on dimly lit environments commonly encountered in coal mines. The results of these experiments demonstrated that the algorithm substantially outperformed other comparative denoising algorithms, in terms of subjective evaluations and objective metrics such as peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). In tunnel scenes with a high level of Gaussian noise (level 50), the algorithm achieved substantial improvements in PSNR/SSIM values compared to existing methods such as B2U and NBR2NBR, with increases of 4.2 dB/0.055 and 2.99 dB/0.077, respectively. Furthermore, when tested on images corrupted with Gaussian noise levels ranging from 5 to 50 on the public datasets, the algorithm consistently demonstrated substantial PSNR improvements over the second-best method, with increases of 1.09%, 0.72%, and 0.68% for Kodak24, BSD300, and BSDS500, respectively.
The proposed self-supervised denoising algorithm has demonstrated a strong capability to remove noise while preserving overall image information from single coal mine images, across various noise levels and types. This finding highlights the algorithm’s robustness and generalization capabilities, making it a promising tool for real-world applications in coal mine monitoring and safety systems. The effectiveness of the algorithm in enhancing image quality and improving the accuracy of information extraction, even under challenging conditions, underscores its potential to make a substantial contribution to the field of coal mine image processing and analysis.The code in this paper can be obtained by https://www.sciclb.cn/anonymous/skpswk56.
In recent years, single-object tracking in satellite videos has gained substantial attention and plays a pivotal role in military and civilian domains. This tracking has found applications in urban-scale disaster relief, public security surveillance, and the monitoring of emergency events, among others. However, due to a combination of factors, such as small target size, interference from similar targets, motion blur, and complex backgrounds, single-object tracking in satellite videos presents numerous challenges. Aiming to promote further exploration in this domain by scholars domestically and internationally, this paper comprehensively reviews and critically analyzes the current state of the art in satellite video-based single-object tracking. Considering challenges and advantages, video satellites offer an expansive field of view. Targets such as vehicles typically occupy only a few to a dozen pixels in satellite videos, with limited distinguishing features or textures. Additionally, satellite videos contain many targets, and the distinguishability between the targets of interest and interfering objects is poor, presenting a high degree of similarity. Moreover, target blurring may occur due to their rapid target movement or satellite platform jitter. When the moving target is inconspicuous and background information overshadows target features, tracking failure is likely to occur. However, compared to ground-based or low-altitude videos, satellite video-based object tracking offers certain advantages. For example, external factors related to the target, such as the camera perspective, are relatively stable, aiding tracking algorithms in maintaining a consistent lock on the target. Most objects in satellite videos are rigid and rarely undergo substantial deformation during tracking. Additionally, the aspect ratios of targets remain approximately consistent across video frames, reducing the potential for algorithmic confusion. The motion of targets is typically straightforward, with trajectories generally following straight lines or smooth curves, enabling the prediction of target positions based on historical motion data. Regarding the development of tracking methods, this paper reviews the evolution of single-object tracking methods for satellite videos and highlights typical tracking paradigms, including generative-based approaches, correlation filter-based methods, and deep learning-based techniques. Deep learning-based tracking methods can be further classified into convolutional neural network (CNN)-based and Transformer-based methods. In contrast to the hand-crafted features employed in correlation filter-based methods, CNNs can extract more comprehensive and robust features, thereby enhancing target tracking performance. In recent years, an increasing number of scholars have applied CNNs to satellite video object tracking tasks. However, when processing high-resolution images, long time-series data, and complex backgrounds, which are common in satellite videos, CNNs exhibit certain limitations. Aiming to address these limitations, Transformers have been gradually introduced into satellite video object tracking. Transformers can capture global spatial information and long-term temporal dependencies, offering a promising alternative for improving tracking accuracy in complex scenarios. Regarding datasets and evaluation metrics, this study compiles existing single-object tracking datasets for satellite videos, along with commonly adopted performance evaluation metrics. Prominent datasets in this field include XDU-BDSTU, video satellite objects(VISO), SatSOT, and the oriented object tracking benchmark(OOTB). Among them, the VISO dataset is the largest in scale, comprising training and test subsets. The XDU-BDSTU dataset features images with a large swath width, making it suitable for long-term tracking tasks. The OOTB dataset provides annotations using rotated bounding boxes, which accurately represents the actual target geometry. The main performance evaluation metrics include precision, success rate, and frame rate, which collectively assess tracking methods in terms of tracking accuracy and speed. Aiming to evaluate the applicability of various tracking algorithms across different scenarios, this paper selects 18 algorithms for performance evaluation and analysis on a self-constructed test set. Experimental results highlight the critical roles of motion estimation, temporal information utilization, and background information exploitation in satellite video object tracking. Specifically, the correlation filter with motion estimation(CFME) algorithm leverages historical motion information of the target to enhance tracking performance, while the Trdimp algorithm incorporates temporal and background information, yielding favorable outcomes. When a vehicle makes a turn, the hand-crafted features employed by the correlation filter-based method CFME lack rotational invariance and are poorly equipped to handle changes in the target’s bounding box due to rotation, resulting in suboptimal tracking performance. Conversely, methods such as Trdimp and Trsiam directly estimate the target’s bounding box, while approaches such as siamese region proposal network(SiamRPN) and SiamRPN++ predefine anchor boxes with different aspect ratios, effectively addressing the challenge of in-plane rotation. Finally, in terms of future perspectives, this paper outlines the anticipated trajectory of single-object tracking algorithms for satellite videos across several key dimensions: standardizing evaluation metrics for tracking results, developing large-scale and high-quality satellite video object tracking datasets, devising models specifically tailored to satellite video tracking challenges, and enabling robust long-term tracking capabilities. In the domain of general video target tracking, commonly used evaluation metrics include those from the OTB and VOT benchmarks. For satellite video target tracking, scholars predominantly adopt the precision and success rate metrics defined by the OTB evaluation framework. In the OTB metrics for general videos, the precision threshold is customarily set to 20 pixels, and the success rate is evaluated based on the area under the curve (AUC) of the overlap score. However, in satellite video target tracking, researchers often adopt varying threshold settings, which hinders the objective evaluation of algorithms under a unified standard. Thus, standardizing evaluation metrics for tracking results is essential for the advancement of satellite video single-object tracking. Before the emergence of large-scale test datasets, most studies in satellite video object tracking verified algorithms using only a few targets, which restricted comprehensive algorithm performance assessment. Moreover, the use of different test dataset across studies has further hindered direct comparisons between algorithms. Consequently, the development of large-scale, high-quality satellite video object tracking datasets is urgently needed, not only for effective model training, but also for model testing and performance benchmarking. Future research could benefit from rapidly assimilating the latest advancements in general video object tracking domain and adapting them to the unique characteristics of satellite videos. Given the rich background information and the continuous, linear nature of target motion trajectories between adjacent frames in satellite videos, these priors can be fully leveraged to explore global spatial and temporal information, thereby enhancing tracking accuracy. Furthermore, techniques such as knowledge distillation, network pruning, and neural architecture search hold considerable potential for autonomously constructing streamlined, low-complexity models specifically tailored to satellite video single-object tracking. These approaches can enable high-precision, real-time target tracking under constrained computation resources. In contrast to ground-based surveillance videos, satellite videos offer broad coverage, making it possible to track trajectories across entire urban areas. However, in such large-scale scenarios, multiple challenges, such as occlusion, interference from similar objects, motion blur, illumination variation, and target rotation, often occur simultaneously. Aiming to address the demands of real-world applications, the development of satellite video tracking algorithms capable of simultaneously addressing these challenges is imperative.
Most existing vision-based rail defect detection methods face challenges such as high parameter counts, computational complexity, slow detection speeds, and limited accuracy. Aiming to overcome these limitations, this paper introduces a lightweight pyramid cross-attention network (LPCANet) for orbital image defect detection using RGB images and depth images.
LPCANet adopts MobileNetv2 as its backbone network to extract multiscale feature maps from RGB images. Simultaneously, a lightweight pyramid module (LPM) is employed to extract similarly-sized feature maps from depth images. Each stage of the LPM comprises a sequence of operations including max pooling, a 3 × 3 convolutional layer, batch normalization, and ReLU activation, enabling efficient extraction of features from depth images. By leveraging deep learning, RGB-D technology, and salient object detection, LPCANet efficiently extracts multiscale feature representations from RGB and depth data. The LPM handles depth image features, while the backbone captures detailed pyramid features from RGB images. Subsequently, a cross-attention mechanism (CAM) is applied to integrate the feature maps from both modalities, enhancing the network’s focus on relevant defect regions. Additionally, a spatial feature extractor (SFE) is introduced to further boost defect detection performance. Finally, a “pixel shuffle” operation is used to restore the output to the original image resolution.
The proposed scheme was computationally evaluated using the PyTorch library in an environment equipped with an NVIDIA 3090 GPU, alongside several benchmark models for comparison. For the evaluation of LPCANet, three publicly available unsupervised RGB-D rail datasets were used: NEU-RSDDS-AUG, RSDD-TYPE1, and RSDD-TYPE2. Experimental results on the NEU-RSDDS-AUG dataset indicate that LPCANet achieves excellent efficiency, with 9.90 million parameters, a computational complexity of 2.50 G, a model size of 37.95 MB, and a running speed of 162.60 frames per second. Compared to 18 existing rail defect detection schemes, LPCANet exhibits superior lightness in performance. In particular, when compared against CSEPNet, the current best-performing model, LPCANet achieves improvements across several evaluation metrics: +1.48% in Sα, +0.86% in intersection over union (IOU), +0.14% in Fβmax, +0.03% in mean average precision (mAP), and +1.77% in mean absolute error (MAE). An ablation study was conducted on four upsampling methods (interpolation, transposed convolution, patch merging, and “pixel shuffle”) to evaluate their effectiveness within the LPCANet framework. Among these, the “pixel shuffle” method demonstrated clear advantages and was found to be the most suitable for the LPCANet model. Further ablation studies were conducted on four different components (backbone network, LPM, SFE, and CAM). The results indicate that CAM and SFE notably enhance the detection performance of LPCANet. An in-depth analysis of various backbone networks confirmed that LPCANet model is not only compatible with existing backbone networks but also consistently achieves superior detection results. Aiming to evaluate the model’s generalization capability beyond rail datasets, experiments were also conducted on three non-rail defect datasets: DAGM2007, MT, and Kolektor-SDD2. The results show that LPCANet delivers improved performance across three key metrics: mAP, MAE, and IOU, demonstrating its potential for general-purpose defect detection tasks.
The LPCANet model proposed in this study effectively combines the advantages of traditional and deep learning approaches, demonstrating strong practical value in the field of rail defect image processing. In the future, this scheme will focus on further reducing the model size to achieve rapid detection speeds while ensuring further improvements in performance quality.
In recent years, rapid advancements in digital technology have positioned digital orthodontics as a critical research focus within the field of dentistry. Among the numerous challenges encountered during orthodontic treatment, designing an accurate dental arch line is fundamental for precisely calculating the target positions of teeth after treatment. The dental arch line should not only follow the natural growth patterns of the teeth but also satisfy aesthetic and functional requirements essential for optimal orthodontic outcomes. However, current automated tooth alignment methods typically model the dental arch line using Beta functions, which are inherently limited by their restricted degrees of freedom. This limitation often prevents these methods from generating curves that accurately capture the ideal dental arch form, especially when dealing with complex or irregular tooth arrangements. Moreover, orthodontists frequently require customized dental arch lines tailored to each patient’s unique oral condition. However, arch lines fitted solely from the patient’s initial intraoral scan may not always align with therapeutic or aesthetic expectations, necessitating labor-intensive manual adjustments. These challenges highlight the need for a flexible and precise approach to dental arch line design that effectively meets clinical standards and patient-specific requirements. Aiming to address these limitations, this paper proposes a novel dental arch line fitting method based on cumulative chord length parameterization combined with Hermite interpolation. This approach aims to enhance control over the dental arch shape, improve fitting accuracy, and provide orthodontists with a highly effective and efficient tool for designing and adjusting dental arch lines during orthodontic treatment planning.
The proposed method begins by inputting the patient’s intraoral scan data, which undergoes a series of preprocessing steps to ensure data quality and consistency. A tooth segmentation algorithm is then applied to accurately isolate each individual tooth, following internationally recognized dental segmentation standards. After segmentation, a landmark detection algorithm is employed to extract key landmarks from each tooth, capturing essential geometric and morphological features. These landmarks serve as the foundation for subsequent dental arch line fitting. Aiming to facilitate the interpolation process, the extracted landmarks are initially reparameterized using cumulative chord length parameterization. This process generates a naturally distributed set of interpolation points along the dental arch by accounting for the varying distances between adjacent landmarks, thereby preserving the true spatial relationships among teeth. Subsequently, Hermite interpolation is employed to construct the dental arch line through the parameterized points. By incorporating position and tangent information, Hermite interpolation enables the construction of smooth, continuous curves with enhanced local control. Aiming to ensure fitting accuracy and smoothness, a coefficient matrix is constructed to formulate a system of linear equations. Solving this system yields the final dental arch line, represented as a piecewise continuous function. This piecewise structure allows for precise local adjustments, making the method particularly effectively for accommodating complicated or irregular tooth arrangements. Furthermore, this paper introduces two new mathematical evaluation metrics: the mean shortest distance and the maximum shortest distance between the extracted landmarks and the fitted curve. These metrics offer an objective and robust means of assessing how accurately the generated dental arch line conforms to the patient’s actual dental morphology.
The proposed fitting method, which integrates cumulative chord length parameterization with Hermite interpolation, exhibits substantial improvements over traditional approaches in dental arch line fitting. First, compared to conventional Beta function-based methods, the proposed approach offers substantially greater flexibility by allowing the inclusion of additional control points. This increased degree of freedom directly addresses the limitations of Beta functions, particularly their inability to support localized shape modifications. The resulting dental arch line provides orthodontists with the flexibility to manually adjust specific, predefined control points, enabling localized adjustments tailored to individual patient needs. The proposed method excels in offering excellent controllability for global and local morphology adjustments of the dental arch line while maintaining high accuracy and smoothness across all regions, attributed to the use of its piecewise functional structure. Experimental evaluations further highlight the advantages of the proposed method. Qualitative analyses show that the generated curves more naturally align with actual dental arch shapes than those produced by conventional methods. Quantitative results, assessed using the proposed shortest distance-based evaluation metrics, confirm a notable improvement in fitting accuracy and alignment with natural tooth arrangements. Additionally, the proposed method enhances clinical flexibility, allowing orthodontists to efficiently adjust the dental arch line by manipulating a limited number of control points, minimizing the need for extensive manual corrections. In practical scenarios, the proposed fitting method is integrated into an existing automated tooth alignment system. This integration led to noticeably improved orthodontic outcomes, further validating the practical effectiveness and clinical applicability of the proposed method.
Compared to existing dental arch fitting methods, the proposed method based on cumulative chord length parameterization and Hermite interpolation demonstrates clear advantages in fitting accuracy and flexibility. This method effectively addresses key limitations of traditional approaches, such as difficulty in achieving an ideal dental arch line and limited adaptability to patient-specific variations. By notably increasing the degrees of freedom and enhancing the controllability of the fitting function, the method produces dental arch lines that are not only smooth and accurate but also highly customizable to meet the diverse clinical requirements of modern orthodontic practice. Furthermore, the introduction of quantitative evaluation metrics offers a systematic and objective framework for assessing fitting quality, ensuring that the resulting dental arch lines are aesthetically aligned and functionally sound. Beyond its technical advantages, the method also improves clinical efficiency by reducing the time and effort typically required for dental arch adjustments during treatment planning. Overall, the proposed method offers strong technical support for the advancement of digital orthodontics and holds substantial potential for broader clinical adoption. This paper establishes a solid foundation for further innovations in automated orthodontic treatment systems, opening new possibilities for personalized and precise dental care.
Cultural relics embody the brilliance of civilization, preserve historical heritage, and uphold the national spirit, serving as vivid manifestations of the confidence and depth of Chinese cultural identity. These artifacts are not merely historical remnants; they are living testaments to a nation’s cultural consciousness and aesthetic achievements. In the Chinese context, such artifacts encompass a wide array of forms——ceramics, bronzes, calligraphy, painting, architecture, and intangible heritage such as folk music and traditional theater——which jointly form a rich, continuous narrative of human development and collective memory. Through their material and symbolic importance, these cultural elements reflect and reinforce a shared sense of belonging and historical continuity. With the rapid development of artificial intelligence (AI), now empowering a broad spectrum of industries and becoming deeply integrated into everyday life, the field of cultural heritage is undergoing a fundamental transformation. This transformation is not only technical but also conceptual, redefining how we understand, protect, and interact with our cultural legacy. AI no longer functions solely as a tool for automation or computation; it now plays a central role in knowledge production, decision-making, and creative processes. These capacities are driving a paradigm shift in cultural heritage work——from reactive, static models to proactive, adaptive systems powered by data and learning. The cultural heritage sector, historically reliant on manual preservation, scholarly interpretation, and traditional dissemination methods, is currently being transformed by advanced algorithms, machine learning models, and intelligent data processing frameworks. The profound capabilities of AI——in areas such as image recognition, natural language processing (NLP), data mining, semantic segmentation, and 3D reconstruction——are increasingly being leveraged to support the digitization, restoration, analysis, management, and public engagement of cultural heritage. These developments, driven by advancements in artificial intelligence, are poised to reshape the entire lifecycle of cultural relics—from their initial discovery and documentation to their long-term preservation and dynamic presentation to the public. The key challenge that currently arises—and forms the central concern of this paper—is the effective, responsible, and innovative application of AI within the cultural heritage field. While the potential of AI is undeniable, its implementation must be carefully aligned with the values, sensitivities, and interdisciplinary nature of cultural preservation. In particular, the complex materiality, symbolic importance, and contextual uniqueness of cultural relics demand AI approaches that are interpretable, ethical, and inclusive of human expertise. Ethical considerations, such as bias in training data, the risks of over-reliance on automated interpretations, and the protection of indigenous knowledge systems, must be at the forefront of AI deployment in cultural domains. This paper explores five critical dimensions of AI applications in the cultural heritage sector: prevention, preservation, research, management, and utilization. The five aspects collectively form a holistic framework for understanding how AI technologies can support the sustained vitality and accessibility of cultural resources. In terms of prevention, AI can play a crucial role in developing early warning systems to identify environmental changes and potential threats to cultural relics. By integrating sensor networks with AI-driven monitoring tools, institutions can proactively detect fluctuations in humidity, temperature, light exposure, and other environmental factors that may contribute to material deterioration. Additionally, predictive models based on historical data can be employed to forecast risks and guide strategic conservation efforts. For instance, machine learning algorithms have been employed in several European museums to predict mold outbreaks in organic cultural relics based on microclimatic data, enabling timely interventions. In terms of preservation, AI contributes to digital restoration, 3D reconstruction, and non-invasive diagnostics. For example, deep learning models can help reconstruct missing parts of fragmented murals or manuscripts by learning visual patterns from intact sections. Additionally, hyperspectral imaging combined with AI analysis can uncover faded texts or underdrawings that are invisible to the human eye. These technologies not only extend the physical lifespan of cultural relics but also introduce innovative approaches to virtual preservation. Some institutions are already using AI in the reconstruction of historical architecture through photogrammetry and simulation of ancient environments for educational use. In the domain of research, AI notably enhances the capabilities of scholars to extract knowledge from vast, heterogeneous datasets. Natural language processing facilitates the digitization and analysis of ancient texts, while computer vision supports the classification of cultural relics based on style, origin, and function. Semantic knowledge graphs and AI-assisted databases promote cross-referencing across disciplines and collections, fostering highly integrated and interdisciplinary research outcomes. These tools are proving essential in digital humanities projects that aim to map large cultural corpora or trace stylistic influences across time and geography. The management of cultural heritage institutions and resources also benefits substantially from AI. Intelligent information systems can optimize inventory tracking, automate metadata tagging, and streamline exhibition logistics. Recommendation systems can be tailored to guide curatorial decisions and enhance user interaction. AI can also help balance conservation needs with public access by dynamically regulating visitor flow in sensitive exhibition areas. Moreover, the integration of blockchain with AI for provenance tracking is emerging as a promising area, enhancing the security and transparency of cultural relic records. In terms of utilization, AI is reshaping how cultural heritage is accessed and experienced—particularly in education, tourism, and public engagement. Virtual museums, intelligent chatbots, augmented reality (AR), and personalized content delivery are making cultural experiences highly interactive and accessible. AI-generated reconstructions and immersive simulations allow audiences to engage with history in immersive ways, expanding the reach of cultural heritage to new demographics and global audiences. Platforms such as Google Arts & Culture, along with various national museum initiatives, are increasingly leveraging AI for context-aware storytelling and multilingual access, making culture more inclusive and dynamic. Beyond practical applications, this paper also examines how traditional research methodologies are evolving in response to AI integration. The paper highlights the epistemological shifts occur as cultural interpretation moves from purely human-centered approaches to hybrid models that combine human expertise with computational inference. While AI presents powerful tools, it also raises critical questions about authenticity, authorship, and cultural sovereignty——especially when applied across diverse cultural contexts and communities. The co-construction of meaning between human curators and intelligent systems may enrich interpretations, but it also demands careful calibration of roles and responsibilities. Overall, the integration of AI into the field of cultural heritage presents an unprecedented opportunity and a profound responsibility. As we navigate this new landscape, balancing technological innovation and cultural sensitivity is essential, ensuring that AI serves as a tool for cultural empowerment, rather than erasure. This paper ultimately offers insights into the current landscape and future trajectory of AI in cultural heritage, advocating for collaborative, interdisciplinary efforts to harness the potential of AI while honoring the depth, diversity, and dignity of the world’s cultural legacies.