Most ReadStarting from the early features of emerging research topics and the characteristics of weak signals, this study aims to achieve early identification of emerging research topics through weak signal analysis based on multi-source data.
First, four types of data sources—patents, clinical, news articles, and academic papers—were utilized to extract topics using the BERTopic model and to construct a composite indicator of “emergence” for identifying emerging research topics. Next, based on topic visibility and topic diffusion, topic emergence maps and topic allocation maps were constructed to identify emerging research topics characterized by weak signals. Under the framework of multi-source data cross-validation, the potential influence of these topics was assessed. Finally, an empirical analysis is conducted in the field of stem cells.
The empirical findings indicate that the weak signal-based emerging research topics identified in this study align more closely with the technological directions outlined in authoritative reports, high-impact journal publications, and specialized academic guidelines compared to other types of topics, and they exhibit strong cross-domain influence. The proposed early identification method for emerging research topics, based on weak signal analysis of multi-source data, not only enables early detection but also enhances the accuracy and coverage of topic identification through integrated multi-source data.
The digital transformation of Chinese Materia Medica(CMM) classics is critical for bridging ancient pharmacological wisdom with modern drug discovery. However, existing Knowledge Graphs(KGs) for CMM are often constructed in isolation, resulting in fragmented information silos that hinder global data interoperability. While Entity Alignment(EA) has become a focal point in the international Semantic Web community, specific research targeting the alignment of ancient CMM literature remains a significant gap. Moreover, current state-of-the-art models—primarily designed for modern, high-resource languages—struggle to address the unique challenges of ancient Chinese texts. These challenges include severe structural heterogeneity caused by disparate historical writing styles, high terminological ambiguity where distinct medical concepts share similar characters, and a critical scarcity of high-quality annotated data⁃sets. This study aims to fill this gap by proposing a domain-specific deep learning framework designed to automate the fusion of multi-source historical medical knowledge.
To overcome these barriers, this paper proposed the Generative Adversarial Fuzzy-boundary Learning(GAFL-Align) model. The study utilized two representative classics spanning different historical eras: Shennong Bencao Jing and Tangye Bencao. After data cleaning, the datasets comprised 3 771 and 3 910 normalized entities, respectively, focusing on core categories such as herbs, symptoms, and diseases. The technical architecture integrated BERT for deep semantic encoding with Graph Attention Networks(GAT) to capture topological structures. To handle distribution shifts across heterogeneous texts, the model employed a Generative Adversarial Network(GAN) for domain adaptation, mapping entities into a unified feature space. Furthermore, a novel fuzzy boundary negative sampling strategy was developed to distinguish “hard negatives”—terms with high lexical similarity but distinct medical meanings. To address data scarcity, an iterative self-training mechanism with confidence-aware filtering was implemented to augment the training set from a limited number of expert-annotated seed pairs.
Experimental results indicated that GAFL-Align achieved a Hits@1 score of 83.59%, significantly outperforming nine baselines, including translation-based models, GNN variants, and Large Language Models(LLMs)-augmented approaches like ChatEA. The model successfully constructed a fused KG containing 6 826 entities, effectively merging heterogeneous data while preserving unique source-specific attributes. These findings demonstrate that combining adversarial domain adaptation with fine-grained semantic differentiation offers a superior solution for low-resource historical knowledge fusion compared to generic LLMs. Ultimately, this research provides a robust technical foundation for the intelligent organization of CMM heritage, offering significant implications for digital humanities and the global standardization of traditional medicine data.
This research aims to explore the potential relationship between technical topics through the multi-topic association relationship, integrate the multi-topic association relationship to supplement the potential information, and explore the potential evolution path of frontier interdisciplinary technical topics.
Based on patent technology citations and cooperation relationships, this study proposed to strengthen relationships and new relationships for technical topics and used the entropy weight method to integrate multi-relation. Combining topics clustering and similarity measurement, this study measured the strength of technology paths and identified the evolution paths of technology topics.
Using “graphene sensing” as an example, this study identifies three potential main paths of technological topics related to biomolecule separation, purification and sensing, glucose detection and control, and photoelectrochemical sensing, proving the effectiveness and feasibility of our method. The study also shows that multi-relation fusion has a positive supplementary effect on the study of topics evolution; keywords reflect the relationship between topics at the semantic level; and the overlap of weak relationships between technical topics is helpful to explore the potential main paths of technological topics.
The study on prompt literacy and its cultivation methods for intelligence tasks could help boost efficiency in human-AI collaboration when using GenAI.
The paper defined the concept of prompt literacy, which is based on a five-dimensional framework consisting of task decomposition, prompt adaptation, prompt design, prompt standardization, verification and evaluation on AIGC, interaction feedback, and prompt iteration. Following this framework, the study proposed the cultivation methods for prompt literacy, including comprehension, application ability, strengthening skills, and optimization of feedback.
The five-dimensional framework relying on intelligence thinking built a closed-loop system, and its four-step cultivation method achieved the development from cognition to internalization, which supported cultivating versatile intelligence talents in the era of artificial intelligence.
Academic discourse power is an important manifestation of a nation’s cultural soft power. Studying scholars’ academic discourse power and enriching and improving the theoretical framework and indicator system for evaluating scholars’ academic discourse power can help enhance a country’s academic discourse power.
This paper integrated traditional literature metrics with Altmetrics indicators to construct a comprehensive evaluation system for scholars’ academic discourse power. Taking WoS data on gene editing as an example, the study conducted empirical analysis from two dimensions: scholars’ academic influence and social discourse guidance, using methods such as integrated factor analysis, entropy weighting, TOPSIS, and two-dimensional evaluation.
The study indicates that the comprehensive evaluation system incorporating Altmetrics indicators has a certain degree of credi⁃bility. Among these, “average citations per paper” and “patent mentions” are key indicators influencing scholars’ academic influence, and communication platforms such as social media and Baidu Baike can effectively enhance scholars’ academic influence.
Scientific questions serve as the starting point of scientific inquiry, determining the depth, breadth, and impact of research endeavors. However, amidst the exponential growth of global scientific publications, identifying high-value research gaps from the vast volume of literature has become an overwhelming cognitive burden for researchers. Consequently, developing automated methodologies to generate research questions from large-scale literature is of critical importance.
To address this need, this paper proposed the Automatic Generation Method of Scientific Questions(AGMSQ), a novel framework leveraging Large Language Models(LLMs). By tailoring the generation process to specific question types, AGMSQ guided LLMs to produce high-quality research questions that were structurally rigorous and deeply grounded in the literature context. The method comprised three core modules: the Scientific Question Classification Module, the Generation Template Design Module, and the LLM Generation Module. First, the Classification Module categorized questions into five types: descriptive, explanatory, methodological, evaluative, and normative. This fine-grained taxonomy enabled the model to capture the distinct logical patterns and semantic requirements inherent to different modes of scientific inquiry, thereby enhancing the precision of generation. Second, the Template Design Module constructed element-generation templates based on the structural principles of each question type. It integrated key element triplets extracted from “Future Work Sentences”(FWS) with domain extension search topics, which were matched to the triplets via semantic distance. Finally, the LLM Generation Module utilized parameter-fine-tuned models—including ChatGPT-4, ChatGPT-3.5, Claude 3 Sonnet, and Gemini Pro—to synthesize research questions based on the combined input elements. Additionally, the study introduced two quantitative indicators—the Utilization Rate of Prompts(URP) and the Occupancy Rate of New Words(ORN)—to evaluate and optimize the generation performance of the LLMs.
The experiments utilize an FWS dataset sourced from the natural language processing domain, specifically targeting the generation of methodological questions. Expert evaluations indicate that the research questions generated by AGMSQ demonstrate favorable performance in terms of clarity, originality, feasibility, and academic value. Notably, among the evaluated models, Claude 3 Sonnet exhibits the superior generation performance. Furthermore, quantitative analysis based on URP and ORN metrics corroborates the expert findings, confirming that the optimized prompts effectively reduce semantic redundancy and increase the efficient utilization of input information. These findings validate the capability of LLMs to generate methodological questions within the natural language processing domain, offering empirical evidence and valuable insights for future exploration across diverse disciplines and question types. Overall, this study offers new insights and tools for automating research topic selection, representing a concrete practice of the “AI for Science” paradigm.
This study aims to reveal the potential peer-review function of review articles, thereby deepening the understanding of their role in scholarly communication and research evaluation.
The study first reviewed the literature on citation behaviors of review articles, with particular attention to their tendency of “selective citation”. It then drawed on the Web of Science database, selecting 40436 papers published between 2000 and 2002 in three fields—Astronomy & Astrophysics, Computer Science Theory & Methods, and Critical Care Medicine. Key variables, including frequency of review citations and the timing of first and last review citations, were constructed, alongside multiple control factors. Negative binomial regression models were employed to examine the relationship between review citation characteristics and academic impact.
The results show that papers cited by reviews received significantly more citations overall. Moreover, higher review citation frequency, earlier first review citation, and longer citation duration were all positively associated with stronger academic performance. These findings suggest that review citations not only embody the integrative role of reviews but also function as a form of implicit quality filtering, offering a novel perspective for developing diversified research evaluation indicators.
Addressing the challenges in named entity recognition(NER) for Beiyang Government Document Resources due to linguistic complexity, diversity, and lack of annotation data, this paper proposes a large language model-based NER framework adapted for low-resource scenarios. This framework provides methodological su⁃pport for structured mining and knowledge reorganization of modern historical documents. [Methods/
This framework integrated retrieval-enhanced generation with efficient parameter fine-tuning. It used Faiss vector retrieval to build a dynamic context example selection method and used the LoRA strategy to add domain knowledge to large language models.On a custom corpus, the study designed seven special entity types,including persons, places, organizations, time, positions, events,and document types. The study then compared two deep learning entity recognition methods, BERT-BiLSTM-CRF and RoBERTa-BiLSTM-CRF,with Baichuan-4B, DcepSeck-R1, Xunzi-Qwen3-8B, Qwen3-4B,Llama, and GPT-4. The study evaluated large language models performance under different sampling methods.
Experiments demonstrate that compared to traditional deep learning models and general-purpose large language mo⁃dels, the synergistic paradigm integrating LoRA fine-tuning with RAG significantly enhances entity recognition performance, achieving an overall F1 score of 0.857. A framework that uses RAG with large, fine-tuned language models for named entity recognition in Beiyang Government Document Resources works well together, and it achieves accurate entity identification in these historical records. This shows that large language models are practical and can be scaled when processing historical documents with limited resources.
Multi-subject collaborative participation is the key to building a secure and trustworthy Generative Artificial Intelligence(GAI) service ecosystem. This paper explores the strategic behaviors and influencing factors of relevant subjects in the context of GAI service user privacy protection, broadens the research scenarios of stochastic evolutionary game theory, deepens the understanding of the dynamic mechanism of privacy leakage for GAI service users, enriches the theories and methods in the field of GAI privacy protection, and provides enlightenment for promoting the practice of privacy protection for GAI service users.
This paper, based on evolutionary game theory and stochastic processes, constructed a time-varying user privacy leakage risk function, and built a “user-service provider-government” three-party stochastic evolutionary game model with the particularity of the GAI service scenario. By applying Itô stochastic differential equation theory and numerical simulation, this paper analyzed the stability and evolution of the behavior strategies of these three parties.
The findings indicate that: ①The initial willingness of the government, service providers, and users significantly influences the direction and speed of subsequent system evolution; only when both users and the government exhibit high initial willingness could GAI service providers adopt proactive protection strategies, leading the system toward an optimal state. ②The greater the intensity of random disturbances, the slower the convergence speed of the three parties to a stable strategy, and GAI service providers are more sensitive to uncertain factors. ③Under conditions of high initial willingness from all three parties, market winners and participants converge to the ideal state, whereas market survivors find it challenging to reach this state; vertical GAI service providers demonstrate faster forward convergence compared to general GAI service providers. ④When the government increases the fines for GAI service providers and provides moderate rewards, it would increase the probability of their active privacy protection and thereby reduce the probability of privacy leakage. However, if the government reduces the punishment for users’ false reports, although it increases the probability of users disclosing privacy, if the fines for service providers are too low, it would not prompt them to adopt an active protection strategy. Increasing user rights protection compensation or reducing the cost of rights protection by the government could both encourage users to actively disclose privacy.
Cultivating older adults’ perception of aging is crucial for promoting their mental health and subjective well-being. This study explores the moderating effect of skeuomorphic affordances—from the dual dimensions of representation and manipulation in human-AI interaction—on aging perception. The goal is to provide theoretical contribution for enhancing older adults’ technical adaptability and designing cognition-friendly agents.
This study employed a 2(representational skeuomorphism: machine-like appearance vs. animal-like appearance)×2(operational skeuomorphism: light feedback interaction vs. blink feedback interaction) between-subjects design experiment. This study examined the impact of skeuomorphic affordances in the dual dimensions of representation and manipulation on older adults’ perception of aging, as well as the chain mediating mechanism of sense of control and self-efficacy within this network.
Animal-shaped monitoring devices significantly enhanced the perceived level of aging among older adults, achieving optimal effects when combined with blinking responses; Sense of control and self-efficacy demonstrated chain mediation effects between skeuomorphic design combinations and aging perception; Consistency between manipulative and representational skeuomorphism reduced cognitive load and improved interaction experiences. This study reveals the cognitive compatibility advantages of skeuomorphic design for senior users, proposing a dual-dimensional design framework (representation and manipulation) for aging-adapted AI design. The identified chain-mediating mechanisms enrich the theoretical framework of how human-AI interaction shapes aging perceptions.