Automatic Generation of Research Questions Based on Large Language Models

Automatic Generation of Research Questions Based on Large Language Models

PDF

Ning Zhou¹, Chao Min¹^,^*, Tao Fan², Yuxuan Liu³, Wen Zhang¹^,⁴, Qinjian Yuan¹

Journal of Modern Information | 2026, 46(3) : 3 - 17

Less

Journal of Modern Information | 2026, 46(3): 3-17

• INFORMATION METHODOLOGY and TECHNOLOGICAL INNOVATION •

Automatic Generation of Research Questions Based on Large Language Models

Full

Ning Zhou¹, Chao Min¹^,^*, Tao Fan², Yuxuan Liu³, Wen Zhang¹^,⁴, Qinjian Yuan¹

Affiliations

¹School of Information Management，Nanjing University，Nanjing210023，China

²School of Public Administration，Nanjing University of Finance and Economics，Nanjing210023，China

³School of Social and Behavioral，Nanjing University，Nanjing210023，China

⁴Jiangsu Academy of Social Sciences，Nanjing210004，China

Published: 2026-03-01 doi: 10.3969/j.issn.1008-0821.2026.03.001

Outline

Abstract

Less

Purpose/Significance

Scientific questions serve as the starting point of scientific inquiry， determining the depth， breadth， and impact of research endeavors. However， amidst the exponential growth of global scientific publications， identifying high-value research gaps from the vast volume of literature has become an overwhelming cognitive burden for researchers. Consequently， developing automated methodologies to generate research questions from large-scale literature is of critical importance.

Method/Process

To address this need， this paper proposed the Automatic Generation Method of Scientific Questions（AGMSQ）， a novel framework leveraging Large Language Models（LLMs）. By tailoring the generation process to specific question types， AGMSQ guided LLMs to produce high-quality research questions that were structurally rigorous and deeply grounded in the literature context. The method comprised three core modules： the Scientific Question Classification Module， the Generation Template Design Module， and the LLM Generation Module. First， the Classification Module categorized questions into five types： descriptive， explanatory， methodological， evaluative， and normative. This fine-grained taxonomy enabled the model to capture the distinct logical patterns and semantic requirements inherent to different modes of scientific inquiry， thereby enhancing the precision of generation. Second， the Template Design Module constructed element-generation templates based on the structural principles of each question type. It integrated key element triplets extracted from “Future Work Sentences”（FWS） with domain extension search topics， which were matched to the triplets via semantic distance. Finally， the LLM Generation Module utilized parameter-fine-tuned models—including ChatGPT-4， ChatGPT-3.5， Claude 3 Sonnet， and Gemini Pro—to synthesize research questions based on the combined input elements. Additionally， the study introduced two quantitative indicators—the Utilization Rate of Prompts（URP） and the Occupancy Rate of New Words（ORN）—to evaluate and optimize the generation performance of the LLMs.

Result/Conclusion

The experiments utilize an FWS dataset sourced from the natural language processing domain， specifically targeting the generation of methodological questions. Expert evaluations indicate that the research questions generated by AGMSQ demonstrate favorable performance in terms of clarity， originality， feasibility， and academic value. Notably， among the evaluated models， Claude 3 Sonnet exhibits the superior generation performance. Furthermore， quantitative analysis based on URP and ORN metrics corroborates the expert findings， confirming that the optimized prompts effectively reduce semantic redundancy and increase the efficient utilization of input information. These findings validate the capability of LLMs to generate methodological questions within the natural language processing domain， offering empirical evidence and valuable insights for future exploration across diverse disciplines and question types. Overall， this study offers new insights and tools for automating research topic selection， representing a concrete practice of the “AI for Science” paradigm.

Key words

research question / automatic generation / large language models / AI for science / natural language processing

Cite this Article

Ning Zhou, Chao Min, Tao Fan, Yuxuan Liu, Wen Zhang, Qinjian Yuan. Automatic Generation of Research Questions Based on Large Language Models[J]. Journal of Modern Information, 2026 , 46 (3) : 3 -17 . DOI: 10.3969/j.issn.1008-0821.2026.03.001

Appendix

Less

Year 2026 volume 46 Issue 3

PDF

Cite this Article

BibTeX

Article Info

doi: 10.3969/j.issn.1008-0821.2026.03.001

Receive Date：2025-09-02
Online Date：2026-06-05
Published：2026-03-01

Article Data

Affiliations

History

Received：2025-09-02

Affiliations

¹School of Information Management，Nanjing University，Nanjing210023，China

²School of Public Administration，Nanjing University of Finance and Economics，Nanjing210023，China

³School of Social and Behavioral，Nanjing University，Nanjing210023，China

⁴Jiangsu Academy of Social Sciences，Nanjing210004，China

References

Share

https://castjournals.cast.org.cn/joweb/xdqb/EN/10.3969/j.issn.1008-0821.2026.03.001

Share to

Scan QR to access full text

Cite this article

BibTeX

Citations

表12种不同金属材料的力学参数

科 Family	属数 Number of genus	种数 Number of species	占总种数比例 Percentage of total species (%)	属 Genus	种数 Number of species	占总种数比例 Percentage of total species (%)
鹅膏菌科Amanitaceae	2	11	5.26	鹅膏菌属 Amanita	10	4.78
小菇科 Mycenaceae	2	12	5.74	丝盖伞属 Inocybe	5	2.39
多孔菌科 Polyporaceae	8	14	6.70	蜡蘑属 Laccaria	5	2.39
红菇科 Russulaceae	3	23	11.00	小皮伞属 Marasmius	6	2.87
				小菇属 Mycena	11	5.26
				光柄菇属 Pluteus	5	2.39
				红菇属 Russula	17	8.13
				栓菌属 Trametes	5	2.39

关闭全屏

BibTeX
EndNote
RefWorks
TxT

Articles: Latest Articles; Most Read; Collections

Updates: Events; News; Multimedia

About: About Us

Contact

No. 86 Xueyuan South Road, Haidian District, Beijing

100081

010-62199257

qkjq@cast.org.cn

Copyright © 2025 China Association for Science and Technology. All rights reserved. For all open access content, the relevant licensing terms apply.
Sponsored by the Office of the Leading Group for Cybersecurity and Informatization of CAST, and supported by Science and Technology Review Publishing House