OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
The development of high-performance "deep search" AI agents—models capable of navigating the web to answer complex, multi-step queries—has historically been restricted to large corporations with the resources to run massive, multi-stage training pipelines. These pipelines typically involve continual pre-training, supervised fine-tuning, and complex reinforcement learning. This paper introduces OpenSeeker-v2, an academic project that challenges this industry standard by demonstrating that a simple, focused approach to data quality can produce state-of-the-art results without the need for massive computational infrastructure.
A New Approach to Data Synthesis
The researchers argue that the secret to a powerful search agent lies in the quality and difficulty of the training data rather than the complexity of the training process itself. To test this, they developed a dataset of 10.6k high-difficulty trajectories using three specific modifications:
Scaling Knowledge Graph Size: By expanding the topological graph used during data generation, the model is exposed to a much richer set of information. This forces the agent to learn how to aggregate evidence across multiple sources rather than relying on simple, shallow lookups.
Expanding the Tool Set: By increasing the number of tools available to the agent, the model learns more versatile and flexible strategies for solving diverse problems.
Strict Low-Step Filtering: The team intentionally discarded any training examples that could be solved in just a few steps. By setting a "difficulty floor," they ensured the model was trained exclusively on tasks that require sustained, long-horizon reasoning.
Surpassing Industrial Standards
Despite being trained using only supervised fine-tuning (SFT) and a relatively small dataset, OpenSeeker-v2 achieved state-of-the-art performance on four major benchmarks: BrowseComp, BrowseComp-ZH, Humanity’s Last Exam, and xbench. Most notably, it outperformed Tongyi DeepResearch, a model trained using a much heavier, resource-intensive pipeline that includes continual pre-training and reinforcement learning. This result suggests that high-quality, carefully curated data can effectively bridge the performance gap between academic research and industrial-scale models.
Implications for Future Research
OpenSeeker-v2 is the first state-of-the-art search agent of its scale and paradigm to be developed by a purely academic team using only SFT. By open-sourcing the model weights and sharing their findings, the authors aim to democratize the field of agentic AI. The project demonstrates that the current performance of search agents has not yet reached a saturation point; the researchers believe that further scaling the quantity, quality, and diversity of synthetic data will continue to push the boundaries of what these agents can achieve.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!