OpenSeeker-v2: Pushing the Limits of Search Agents...

OpenSeeker-v2: Pushing the Limits of Search Agents... | AI Research

Key Takeaways

OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories The development of high-performance "deep search" AI age...
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet their development remains dominated by industrial giants.
The typical industry recipe involves a highly resource-intensive pipeline spanning pre-training, continual pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL).
In this report, we show that when fueled with informative and high-difficulty trajectories, a simple SFT approach could be surprisingly powerful for training frontier search agents.
Notably, OpenSeeker-v2 represents the first state-of-the-art search agent within its model scale and paradigm to be developed by a purely academic team using only SFT.

Paper AbstractExpand

Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet their development remains dominated by industrial giants. The typical industry recipe involves a highly resource-intensive pipeline spanning pre-training, continual pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL). In this report, we show that when fueled with informative and high-difficulty trajectories, a simple SFT approach could be surprisingly powerful for training frontier search agents. By introducing three simple data synthesis modifications: scaling knowledge graph size for richer exploration, expanding the tool set size for broader functionality, and strict low-step filtering, we establish a stronger baseline. Trained on merely 10.6k data points, our OpenSeeker-v2 achieves state-of-the-art performance across 4 benchmarks (30B-sized agents with ReAct paradigm): 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity's Last Exam, and 78.0% on xbench, surpassing even Tongyi DeepResearch trained with heavy CPT+SFT+RL pipeline, which achieves 43.4%, 46.7%, 32.9%, and 75.0%, respectively. Notably, OpenSeeker-v2 represents the first state-of-the-art search agent within its model scale and paradigm to be developed by a purely academic team using only SFT. We are excited to open-source the OpenSeeker-v2 model weights and share our simple yet effective findings to make frontier search agent research more accessible to the community.

OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
The development of high-performance "deep search" AI agents—models capable of navigating the web to answer complex, multi-step queries—has historically been restricted to large corporations with the resources to run massive, multi-stage training pipelines. These pipelines typically involve continual pre-training, supervised fine-tuning, and complex reinforcement learning. This paper introduces OpenSeeker-v2, an academic project that challenges this industry standard by demonstrating that a simple, focused approach to data quality can produce state-of-the-art results without the need for massive computational infrastructure.

A New Approach to Data Synthesis

The researchers argue that the secret to a powerful search agent lies in the quality and difficulty of the training data rather than the complexity of the training process itself. To test this, they developed a dataset of 10.6k high-difficulty trajectories using three specific modifications:

Scaling Knowledge Graph Size: By expanding the topological graph used during data generation, the model is exposed to a much richer set of information. This forces the agent to learn how to aggregate evidence across multiple sources rather than relying on simple, shallow lookups.
Expanding the Tool Set: By increasing the number of tools available to the agent, the model learns more versatile and flexible strategies for solving diverse problems.
Strict Low-Step Filtering: The team intentionally discarded any training examples that could be solved in just a few steps. By setting a "difficulty floor," they ensured the model was trained exclusively on tasks that require sustained, long-horizon reasoning.

Surpassing Industrial Standards

Despite being trained using only supervised fine-tuning (SFT) and a relatively small dataset, OpenSeeker-v2 achieved state-of-the-art performance on four major benchmarks: BrowseComp, BrowseComp-ZH, Humanity’s Last Exam, and xbench. Most notably, it outperformed Tongyi DeepResearch, a model trained using a much heavier, resource-intensive pipeline that includes continual pre-training and reinforcement learning. This result suggests that high-quality, carefully curated data can effectively bridge the performance gap between academic research and industrial-scale models.

Implications for Future Research

OpenSeeker-v2 is the first state-of-the-art search agent of its scale and paradigm to be developed by a purely academic team using only SFT. By open-sourcing the model weights and sharing their findings, the authors aim to democratize the field of agentic AI. The project demonstrates that the current performance of search agents has not yet reached a saturation point; the researchers believe that further scaling the quantity, quality, and diversity of synthetic data will continue to push the boundaries of what these agents can achieve.

OpenSeeker-v2: Pushing the Limits of Search Agents... | AI Research

Key Takeaways

A New Approach to Data Synthesis

Surpassing Industrial Standards

Implications for Future Research

Comments (0)

No comments yet