ArchiveThe exponential growth of cultural heritage documentation videos calls for new compression methods that preserve critical details while reducing storage. For static scenes, traditional frame-based compression methods struggle with the trade-off between semantic redundancy and detail preservation. To improve compression efficiency, a novel dual-mode semantic compression framework for static object videos based on neural radiance fields (NeRF) was proposed in this paper. By integrating semantic segmentation with COLMAP technology, the proposed system decouples the video stream into two semantic layers, which are the central object containing critical details and the dynamic background rich in semantic redundancy, respectively. In the proposed dual-mode framework, the focuspriority (FP) mode is designed for scenarios with high-efficiency demands, where only the NeRF-based neural representation of the primary object is preserved and compressed. For scenarios that require additional environmental context, the panorama-compatible (PC) mode synchronously compresses the H.264-encoded background streams and the primary object streams to reconstruct the full scene. Experimental results on singleartifact video data demonstrate that the proposed framework achieves a storage reduction of 20% compared with conventional methods, thus providing a flexible and controllable solution for the compression of cultural heritage documentation videos.
Different objects in Chinese paintings contain rich cultural connotations. Segmenting and extracting different objects in Chinese paintings through technical methods is an effective way to enhance cultural added value and activate cultural resources. Although the existing deep learning methods can extract multi-level features for instance segmentation, the location relationship features of instances are not fully utilized, resulting in poor segmentation results for the traditional Chinese painting (TCP) instance segmentation. In this paper, a novel TCP image instance segmentation algorithm based on the integration of spatial structure characteristics (SSC) was proposed, and is called SSC-Net. Firstly, considering the characteristics of TCP images, such as the gradual color blending and discontinuous contour lines, an instance information entropy composed of color entropy, formed by regional variance, and contour entropy, formed by contour point regression is proposed. Then, aiming at the problem that the existing network structure is difficult to fully consider the location relationship features of instances in TCP images, based on the residual neural network (ResNet) structure, a Chinese painting instance segmentation network framework composed of mask branch and position branch that can integrate spatial structure features is proposed. Finally, the color entropy and contour entropy are input into the mask branch and position branch of the SSC-Net structure respectively, so as to realize the instance segmentation of TCP. The quantitative and qualitative experiments on the challenging TCP database show that, compared with the state-of-the-art algorithms in the same category, the SSC-Net achieves good experimental results with average precision (AP) of 53. 89% and 25. 8 frame per second (FPS). The segmentation results meet the practical application requirements.
Multilevel threshold image segmentation divides an image into several regions with distinct characteristics. While effective, its computational complexity increases exponentially with the number of thresholds, highlighting the need for more efficient and stable methods. An improved sparrow search algorithm (ISSA) that combines multiple strategies to address the dependency on the initial population and solution accuracy issues in the basic sparrow search algorithm (SSA) was proposed in this paper. ISSA leverages circle chaotic mapping to enhance population diversity, a tangent flight operator to improve search diversity, and a triangular random walk to perturb the optimal solution, thereby enhancing global search capability and avoiding local optima. Performance evaluations on 16 benchmark functions demonstrate that ISSA surpasses the gray wolf optimizer (GWO), whale optimization algorithm (WOA), rat swarm optimizer (RSO), moth-flame optimization (MFO), and SSA in terms of search speed, accuracy, and robustness. When applied to multilevel threshold image segmentation, ISSA excels in Kapur's maximum entropy, peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and feature similarity (FSIM), highlighting its significant research value and application potential in the field of image segmentation.
Exploring the interaction between red, green, blue (RGB) and thermal infrared modalities is critical to the success of RGB-thermal (RGB-T) salient object detection (RGB-T SOD). In this paper, a cross-modal attention and reinforcement network (CAR-Net) was proposed to explore the implicit relationship between the two modalities, which fully leverages the beneficial expression and complementary fusion of the two modalities. Specifically, CAR-Net has a cross-modal attention module (CAM) that enables efficient interaction and key information extraction through joint attention. It also includes a feature strengthener module (FSM) for improved representation using channel rank and loop methods. A large number of experiments show that the CAR-Net achieves the best performance on three publicly available datasets.
Accurate prediction of epileptic seizures based on electroencephalogram (EEG) can assist doctors in taking timely treatment measures for patients. Previous prediction methods almost ignored the interaction and correlation among channels and the spatial domain information of multi-channel electrode positions, which may contain salient information related to epileptic states. A two-dimensional (2D) convolutional block attention module (CBAM) based squeeze-excitation (SE) network (TDCBAM-SENet) was proposed in this paper. Firstly, the time-frequency (TF) domain features of multi-channel EEG signal are utilized to design a three-dimensional (3D) matrix according to the relative positions of electrode channels. Then, the 2D CBAM (TDCBAM) is proposed to extract the high-level abstract features of salient information within channels by the channel attention module (CAM) and spatial location features by the spatial attention module (SAM). In addition, the SE model dynamically adjusts the weight of channels based on the spatial position information to capture the interaction information among channels. Finally, a multi-layer perceptron is introduced for classification. The CHB-MIT dataset is employed to evaluate the TDCBAM-SENet. Experimental results demonstrate that the 3D feature matrix effectively represents epilepsy salient information within and between EEG channels, and the TDCBAM-SENet can efficaciously mine the high-level abstract features of these information for epilepsy prediction.
Considering the significant expenses and power requirements associated with digital precoding and the low spectral efficiency (SE) of analog precoding, a hybrid precoding algorithm with efficient finite-resolution phase shifters (PSs), named FRPS algorithm, was proposed for millimeter wave (mmWave) massive multi-input multi-output (MIMO) vehicle-to-vehicle (V2V) system. Digital and analog precoding matrix variables and non-convex constraints of the radio frequency (RF) precoder are decomposed into two independent optimization problems. Discrete iterative optimization is employed to solve for the analog and digital precoder alternately. In addition, the effect of finite-resolution PSs is considered in the precoding algorithm optimization. Simulation results show that the FRPS algorithm has a fast convergence speed, and it can approach the SE of fully digital precoding with only 3 bit resolution PS. The performance difference between the FRPS algorithm and the existing hybrid precoding algorithm with infinite-resolution PSs is almost negligible. Moreover, the FRPS algorithm performs superior to that of the infinite-resolution PS hybrid precoding algorithm and fully digital precoding algorithm.
Hybrid feature extraction model that combines residual neural networks (ResNet) and bidirectional gated recurrent units (ResNet-BGRU) was proposed to address the low accuracy of single feature extraction in deep neural networks. ResNet captures spatial features of network traffic, while bidirectional gated recurrent units (BGRU) extracts temporal features. These features are integrated and refined using a multi-head attention mechanism to handle spatiotemporal multi-dimensional data and adapt to diverse detection scenarios. By enhancing feature representation and capturing complex patterns, the model improves detection performance. Experiments on multiple datasets demonstrate that ResNet-BGRU outperforms traditional machine learning and comparable deep learning methods in accuracy and computational efficiency, offering a more robust solution for network traffic anomaly detection.
The performance of underwater wireless optical communication (UWOC) system is degraded due to the influence of seawater transmission path loss, ocean turbulence effect, and pointing error during the transmission of optical signals. In order to solve this problem, an orthogonal frequency division multiplexing (OFDM) UWOC system composite channel model based on wavelength diversity was established, and the analytical expressions for the outage probability and the average symbol error rate (SER) of the system were derived. The system performance of wavelength diversity UWOC is investigated under different pointing errors, number of subcarriers, and maximum ratio combining (MRC) and equal gain combining (EGC) schemes. The results show that the performance of the UWOC system using the wavelength diversity technique is significantly improved. Compared with the no-diversity system, a diversity order of 3 reduces the outage probability from 10 -3 to the order of 10 -8 and the average SER from 10 -2 to 10 -7. A smaller pointing error reduces the outage probability of the system. Additionally, a larger ratio of beam width to receiver aperture also decreases the outage probability. Increasing the number of subcarriers will increase the system's average symbol error rate.