Democratizing Agentic Reinforcement Learning as a Service
Project Page ยท DeepWiki ยท Slack ยท Wechat
Choose an example below to get started. Each example includes step-by-step instructions for setup, training, and inference.
| Task | Description | Performance |
|---|---|---|
| LLM Single-Turn Math | Mathematical problem solving | wandb |
| LLM Multi-Turn Math | Multi-turn mathematical problem solving with tool calling | wandb |
| LLM Single-LoRA Single-Turn Math | Math single-turn Trained With LoRA | wandb |
| VLM Single-Turn Math | geometry 3k math problem solving | wandb |
| VLM Multi-Turn Math | geometry 3k math problem solving with tool calling | wandb |
| LLM Gomoku Agent | A multi-turn gomoku agent | wandb |
| LLM AlfWorld Agent | A multi-turn alfworld agent | TDA |
git clone --recurse-submodules https://github.com/open-tinker/OpenTinker.git
cd OpenTinkerpip install -e .cd verl
pip install -e .
cd ..After completing the Common Setup, no additional steps are needed.
Note The client currently relies on a small subset of functions from
verl. This dependency is transitional. In future releases, the client will be fully decoupled fromverl, allowing it to remain completely lightweight and independent of training-related code.
In addition to the Common Setup, it must install verl dependencies.
You can choose one of the following two approaches.
# Pull the verl Docker image
docker pull verlai/verl@sha256:3ce56ff018516b28ab9c4f4fc09d3aa67589074495ace75e2674b720aa4d0e5d
# Create and run container
docker run -dit \
--gpus all \
--restart=no \
--entrypoint /bin/bash \
--net=host \
--shm-size=10g \
--cap-add=SYS_ADMIN \
-v .:/workspace/dev \
--name tinker \
verlai/verl@sha256:3ce56ff018516b28ab9c4f4fc09d3aa67589074495ace75e2674b720aa4d0e5dyou can install verl dependencies manually. After completing the Common Setup, run:
cd verl
pip install -r requirements.txt
cd ..This installs all GPU and training-related dependencies required by the server.
OpenTinker includes a built-in authentication system to secure access to the scheduler API.
Edit opentinker/scheduler/config/scheduler.yaml:
enable_auth: true # Set to true to enable authentication, false to disable authentication.
user_db_path: "scheduler_users.db"Run the interactive script to register a user and get an API key:
python opentinker/scheduler/register_user_example.pyFor advanced usage (REST API registration, using the key) and detailed configuration, see the Scheduler & Dashboard Guide.
OpenTinker provides a flexible environment design framework that supports diverse training scenarios. Our architecture accommodates two orthogonal dimensions:
- Data Source: Data-Dependent environments load structured datasets (e.g., parquet files) to provide prompts, while Data-Free environments generate prompts dynamically from simulators or game engines.
- Interaction Mode: Single-Turn environments involve one-shot model responses, while Multi-Turn environments enable iterative interactions with tool calls and feedback loops.
This 2ร2 design space enables four distinct paradigms, each suited to different learning objectives:
| Paradigm | Data Source | Interaction | Example Use Case |
|---|---|---|---|
| Data-Dependent ร Single-Turn | Dataset | One-shot | Math reasoning, QA tasks |
| Data-Dependent ร Multi-Turn | Dataset | Iterative | Tool-assisted problem solving |
| Data-Free ร Single-Turn | Simulator | One-shot | Bandit |
| Data-Free ร Multi-Turn | Simulator | Iterative | Complex game playing, dialogue agents |
- Scheduler & Dashboard Guide - Configuration, Usage, and Web Dashboard
@misc{opentinker2025,
title = {OpenTinker: Democratizing Agentic Reinforcement Learning as a Service},
author = {Siqi Zhu and Jiaxuan You},
year = {2025},
howpublished = {\url{https://github.com/open-tinker/OpenTinker}},
note = {GitHub repository}
}