FlowPIE: Test-Time Scientific Idea Evolution with Flow-Guided Literature Exploration
arXiv:2603.29557v1 Announce Type: new Abstract: Scientific idea generation (SIG) is critical to AI-driven autonomous research, yet existing approaches are often constrained by a static retrieval-then-generation paradigm, leading to homogeneous and insufficiently divergent ideas. In this work, we propose FlowPIE, a tightly coupled retrieval-generation framework that treats literature exploration and idea generation as a co-evolving process. FlowPIE expands literature trajectories via a flow-guided Monte Carlo Tree Search (MCTS) inspired by GFlowNets, using the quality of current ideas assessed by an LLM-based generative reward model (GRM) as a supervised signal to guide adaptive retrieval and construct a diverse, high-quality initial population. Based on this population, FlowPIE models idea
View PDF HTML (experimental)
Abstract:Scientific idea generation (SIG) is critical to AI-driven autonomous research, yet existing approaches are often constrained by a static retrieval-then-generation paradigm, leading to homogeneous and insufficiently divergent ideas. In this work, we propose FlowPIE, a tightly coupled retrieval-generation framework that treats literature exploration and idea generation as a co-evolving process. FlowPIE expands literature trajectories via a flow-guided Monte Carlo Tree Search (MCTS) inspired by GFlowNets, using the quality of current ideas assessed by an LLM-based generative reward model (GRM) as a supervised signal to guide adaptive retrieval and construct a diverse, high-quality initial population. Based on this population, FlowPIE models idea generation as a test-time idea evolution process, applying selection, crossover, and mutation with the isolation island paradigm and GRM-based fitness computation to incorporate cross-domain knowledge. It effectively mitigates the information cocoons arising from over-reliance on parametric knowledge and static literature. Extensive evaluations demonstrate that FlowPIE consistently produces ideas with higher novelty, feasibility and diversity compared to strong LLM-based and agent-based frameworks, while enabling reward scaling during test time.
Comments: 30 pages, 11 figures, 15 tables
Subjects:
Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as: arXiv:2603.29557 [cs.AI]
(or arXiv:2603.29557v1 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2603.29557
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Qiyao Wang [view email] [v1] Tue, 31 Mar 2026 10:37:47 UTC (1,987 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelannouncevaluationOllama v0.19
<p> Massive local model speedup on Apple Silicon with MLX </p> <p> <a href="https://www.producthunt.com/products/ollama?utm_campaign=producthunt-atom-posts-feed&utm_medium=rss-feed&utm_source=producthunt-atom-posts-feed">Discussion</a> | <a href="https://www.producthunt.com/r/p/1112922?app_id=339">Link</a> </p>
Google Veo 3.1 Lite
<p> Google's most cost-effective video generation model </p> <p> <a href="https://www.producthunt.com/products/google-pay-2?utm_campaign=producthunt-atom-posts-feed&utm_medium=rss-feed&utm_source=producthunt-atom-posts-feed">Discussion</a> | <a href="https://www.producthunt.com/r/p/1112544?app_id=339">Link</a> </p>
Baton
<p> Orchestrate your AI coding agents </p> <p> <a href="https://www.producthunt.com/products/baton-2?utm_campaign=producthunt-atom-posts-feed&utm_medium=rss-feed&utm_source=producthunt-atom-posts-feed">Discussion</a> | <a href="https://www.producthunt.com/r/p/1112791?app_id=339">Link</a> </p>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
Ollama v0.19
<p> Massive local model speedup on Apple Silicon with MLX </p> <p> <a href="https://www.producthunt.com/products/ollama?utm_campaign=producthunt-atom-posts-feed&utm_medium=rss-feed&utm_source=producthunt-atom-posts-feed">Discussion</a> | <a href="https://www.producthunt.com/r/p/1112922?app_id=339">Link</a> </p>
Large language models in psychology - Nature
<a href="https://news.google.com/rss/articles/CBMiWEFVX3lxTE5ocmtjRFJXU1NaZ3pDZnc5WmoxUU56RlZ3Sy1CUTduYlh1YU52bEROb2pwUVBMRDgyWGNuYVQ0SHQ0c2djdHVmR1c2TUlrV1Vxa3JGbHRsWjA?oc=5" target="_blank">Large language models in psychology</a> <font color="#6f6f6f">Nature</font>
A meta-analysis of the persuasive power of large language models - Nature
<a href="https://news.google.com/rss/articles/CBMiX0FVX3lxTFBtVkFYLUROMVdUY09HLWF5ZXl2TTBtNHJrSXhBQTRSLWtxUi1mQ2g3cmVBMVF2WnlELVNhUlFnNU41UDdNMDBWRHFZalJYTWdYVE5KcjNfVURLbkNFVTJj?oc=5" target="_blank">A meta-analysis of the persuasive power of large language models</a> <font color="#6f6f6f">Nature</font>
Anthropic Dials Back AI Safety Commitments - WSJ
<a href="https://news.google.com/rss/articles/CBMiiwNBVV95cUxOb1Y0aGxUNmlnWUFuVjBoTFFqTXZBanUwOEwxMmxBUXlfX2Q5ZERpd0k0TnRiMldfWmY2bTFDcTJuQlJRTFNsc1BCX0pwVFFPeldBM1NOVFZ6SmlsekZtemgxU3hSdVptM0l4a01yT1o4V2FVclRwOEc2QmRQZkl6aXhaVnVwclJhYU9qN0pXcWkwYWlfQ3lJRC0xb3FXZ3cwUjZhTFhtWnA5Ul81MDR5N2pJY3pqdEIxM0FNcm5WWDE1VkpCejI1bmZzNU5wQVVHbERqc1RHQmkyUlEyTk02ekRNVFlBYjRQUHkxS3owOXJCT0l0STRVeloya2p3a1dIX3NSMm5XR3lFdUFKekJiU3RiMUM4MlUyR2dFUm5vcGJKS3lLMi1ubnM0QWoxMDUyUEx5MlI3dkk2by1QRm1jN1RKazFfLTJyU0hkUEZicUZuRWs4MDd3U2YtZm9ucG1TS3Z4NjhQZjhMVERBT2laM0ttX2x3ZDR0QlVXcUtGZzMyYWF3M3A0U1lqUQ?oc=5" target="_blank">Anthropic Dials Back AI Safety Commitments</a> <font color="#6f6f6f">WSJ</font>

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!