Skip to content

ASTRA is an end-to-end system for synthesizing agentic trajectories and rule-verifiable environments for SFT and RL training, developed by Beike Language and Intelligence (BLI).

Notifications You must be signed in to change notification settings

LianjiaTech/astra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

简体中文 | English

ASTRA:Automated Synthesis of agentic Trajectories and Reinforcement Arenas

Blog HuggingFace HuggingFace Paper License

🆕 Updates

Date Updates
2026/01/30 📄 Paper Release
2026/01/22 🎉 Release Code, Models, and Datasets

📖 Overview

This repository provides an end-to-end pipeline for fully automated, verifiable synthesis of high-quality data and environments, with native support for process-level rewards. It is designed for training models with multi-step reasoning and tool-use capabilities and easy to scale to new tasks and tools. Here are the two main modules:

  • Trajectory Synthesis: Automatically generate high-quality, multi-step interactive trajectories and verified by reward system.

  • Environment Synthesis: Fully automatically synthesize interactive environments with no human labels required that provide step-wise process rewards to enable RLVR.

Module Function Directory
Trajectory Synthesis Tool graph construction → Task generation → Trajectory collection → Reward assessment trajectory_synthesis/
Environment Synthesis Question decomposition → Automatic tool environment generation → RLVR training data env_synthesis/

🏆 Model Performance

We release two models: ASTRA-32B-Thinking-v1 and ASTRA-14B-Thinking-v1, which are trained with SFT and RL using our synthesized data. Below are the evaluation results on BFCL-V3-MT:

Model Base Long Context Miss Func Miss Param Average ↓
Claude-Opus-4-5-20251101 81.5 70.5 64.0 58.0 68.5
GLM-4.6 74.5 66.5 68.0 63.0 68.0
ASTRA-32B-Thinking-v1 76.5 66.5 65.5 48.5 64.3
Gemini-3-Pro-Preview 69.0 64.0 63.0 56.5 63.1
o3-2025-04-16 68.0 63.0 63.5 54.5 62.3
Claude-Sonnet-4-5-20250929 69.0 59.0 65.0 52.5 61.4
Grok-4-1-fast-reasoning 70.5 62.5 59.5 43.0 58.9
ASTRA-14B-Thinking-v1 67.0 61.0 56.0 48.5 58.1
LoopTool-32B (Report From Paper) - - - - 57.8
Claude-Haiku-4-5-20251001 63.5 56.0 42.5 52.5 53.6
Kimi-K2-Instruct 62.0 55.0 41.0 44.5 50.6
Qwen3-32B 59.0 51.5 47.5 40.5 49.6
Qwen3-30B-A3B-Thinking-2507 66.0 58.0 31.5 35.5 47.8
TouCan-32B (Report From Paper) - - - - 46.5
Qwen3-14B 50.5 48.0 39.5 40.0 44.5
Qwen3-30B-A3B-Instruct-2507 43.5 41.0 10.5 25.0 30.0

🔄 Pipelines

Part 1: Trajectory Synthesis

SFT Pipeline

Starting from MCP Server tool documentation, build tool dependency graphs and generate high-quality SFT training data.

mcp_servers.jsonl → Graph construction → Task generation → LLM interaction → Reward assessment → SFT data

👉 For detailed usage instructions, please refer to trajectory_synthesis/README.md


Part 2: Environment Synthesis

Environment Synthesis Pipeline

Automatically generate executable tool environments from Q&A pairs, supporting RLVR training.

QA data → Question decomposition → Tool necessity check → Verification → Environment synthesis → Tool merging

👉 For detailed usage instructions, please refer to env_synthesis/README.md


📜 License

This project is licensed under Apache 2.0 License.


📎 Citation

@misc{tian2026astraautomatedsynthesisagentic,
      title={ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas}, 
      author={Xiaoyu Tian and Haotian Wang and Shuaiting Chen and Hao Zhou and Kaichi Yu and Yudian Zhang and Jade Ouyang and Junxi Yin and Jiong Chen and Baoyan Guo and Lei Zhang and Junjie Tao and Yuansheng Song and Ming Cui and Chengwei Liu},
      year={2026},
      eprint={2601.21558},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.21558}, 
}

About

ASTRA is an end-to-end system for synthesizing agentic trajectories and rule-verifiable environments for SFT and RL training, developed by Beike Language and Intelligence (BLI).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published