收藏切换
Automatic Generation of Research Questions Based on Large Language Models
收藏切换
PDF
Ning Zhou1, Chao Min1, *, Tao Fan2, Yuxuan Liu3, Wen Zhang1, 4, Qinjian Yuan1
Journal of Modern Information | 2026, 46(3) : 3 - 17
Less
收藏切换
Journal of Modern Information | 2026, 46(3): 3-17
INFORMATION METHODOLOGY and TECHNOLOGICAL INNOVATION
Automatic Generation of Research Questions Based on Large Language Models
Full
Ning Zhou1, Chao Min1, *, Tao Fan2, Yuxuan Liu3, Wen Zhang1, 4, Qinjian Yuan1
Affiliations
  • 1School of Information Management,Nanjing University,Nanjing210023,China
  • 2School of Public Administration,Nanjing University of Finance and Economics,Nanjing210023,China
  • 3School of Social and Behavioral,Nanjing University,Nanjing210023,China
  • 4Jiangsu Academy of Social Sciences,Nanjing210004,China
Published: 2026-03-01 doi: 10.3969/j.issn.1008-0821.2026.03.001
Outline
收藏切换
Purpose/Significance

Scientific questions serve as the starting point of scientific inquiry, determining the depth, breadth, and impact of research endeavors. However, amidst the exponential growth of global scientific publications, identifying high-value research gaps from the vast volume of literature has become an overwhelming cognitive burden for researchers. Consequently, developing automated methodologies to generate research questions from large-scale literature is of critical importance.

Method/Process

To address this need, this paper proposed the Automatic Generation Method of Scientific Questions(AGMSQ), a novel framework leveraging Large Language Models(LLMs). By tailoring the generation process to specific question types, AGMSQ guided LLMs to produce high-quality research questions that were structurally rigorous and deeply grounded in the literature context. The method comprised three core modules: the Scientific Question Classification Module, the Generation Template Design Module, and the LLM Generation Module. First, the Classification Module categorized questions into five types: descriptive, explanatory, methodological, evaluative, and normative. This fine-grained taxonomy enabled the model to capture the distinct logical patterns and semantic requirements inherent to different modes of scientific inquiry, thereby enhancing the precision of generation. Second, the Template Design Module constructed element-generation templates based on the structural principles of each question type. It integrated key element triplets extracted from “Future Work Sentences”(FWS) with domain extension search topics, which were matched to the triplets via semantic distance. Finally, the LLM Generation Module utilized parameter-fine-tuned models—including ChatGPT-4, ChatGPT-3.5, Claude 3 Sonnet, and Gemini Pro—to synthesize research questions based on the combined input elements. Additionally, the study introduced two quantitative indicators—the Utilization Rate of Prompts(URP) and the Occupancy Rate of New Words(ORN)—to evaluate and optimize the generation performance of the LLMs.

Result/Conclusion

The experiments utilize an FWS dataset sourced from the natural language processing domain, specifically targeting the generation of methodological questions. Expert evaluations indicate that the research questions generated by AGMSQ demonstrate favorable performance in terms of clarity, originality, feasibility, and academic value. Notably, among the evaluated models, Claude 3 Sonnet exhibits the superior generation performance. Furthermore, quantitative analysis based on URP and ORN metrics corroborates the expert findings, confirming that the optimized prompts effectively reduce semantic redundancy and increase the efficient utilization of input information. These findings validate the capability of LLMs to generate methodological questions within the natural language processing domain, offering empirical evidence and valuable insights for future exploration across diverse disciplines and question types. Overall, this study offers new insights and tools for automating research topic selection, representing a concrete practice of the “AI for Science” paradigm.

research question  /  automatic generation  /  large language models  /  AI for science  /  natural language processing
Ning Zhou, Chao Min, Tao Fan, Yuxuan Liu, Wen Zhang, Qinjian Yuan. Automatic Generation of Research Questions Based on Large Language Models[J]. Journal of Modern Information, 2026 , 46 (3) : 3 -17 . DOI: 10.3969/j.issn.1008-0821.2026.03.001
Year 2026 volume 46 Issue 3
PDF
23
10
Cite this Article
BibTeX
Article Info
doi: 10.3969/j.issn.1008-0821.2026.03.001
  • Receive Date:2025-09-02
  • Online Date:2026-06-05
  • Published:2026-03-01
Article Data
Affiliations
History
  • Received:2025-09-02
Affiliations
    1School of Information Management,Nanjing University,Nanjing210023,China
    2School of Public Administration,Nanjing University of Finance and Economics,Nanjing210023,China
    3School of Social and Behavioral,Nanjing University,Nanjing210023,China
    4Jiangsu Academy of Social Sciences,Nanjing210004,China
References
Share
https://castjournals.cast.org.cn/joweb/xdqb/EN/10.3969/j.issn.1008-0821.2026.03.001
Share to
QR

Scan QR to access full text

Cite this article
BibTeX
Citations
表12种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏
  • BibTeX
  • EndNote
  • RefWorks
  • TxT