Nab theme, more professional navigation theme
Ctrl + D Favorites
Current Position:fig. beginning " AI Tool Library

TinyZero: A Low-Cost Replication of DeepSeeK-R1 Zero's Epiphany Effect

2025-01-26 2.0 K

General Introduction

TinyZero is a veRL-based reinforcement learning model designed to reproduce the DeepSeeK-R1 Zero's performance in countdown and multiplication tasks. Surprisingly, the program was able to achieve the same epiphanies as DeepSeeK-R1 Zero for a running cost of only $30 (less than 5 hours using 2xH200 at $6.4 per hour). Through Reinforcement Learning (RL), the 3B Base Language Model (LM) is able to autonomously develop self-validation and search capabilities. Users can experience the power and innovation of TinyZero through a simple setup and training process.

TinyZero: low-cost reproduction of DeepSeeK-R1 Zero's epiphany effect-1

 

Function List

  • countdown task: Supports data preparation and training processes to help models learn in countdown tasks.
  • Multiplication tasks: Supports data preparation and training processes to help models learn in multiplication tasks.
  • Single GPU Support: For model parameters less than or equal to 1.5B.
  • Multi-GPU Support: Models applicable to larger parameters are capable of developing sophisticated reasoning.
  • Instruct Ablation: Experiments supporting the QWen-2.5-3B Instruct model.
  • Quality Improvement ToolsThe tools include flash-attn, wandb, IPython, and matplotlib to enhance the model training and usage experience.

 

Using Help

Installation process

  1. Create a virtual environment:
    conda create -n zero python=3.9
    
  2. Install PyTorch (optional):
    pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
    
  3. Install vllm:
    pip3 install vllm==0.6.3
    
  4. Install ray:
    pip3 install ray
    
  5. Install verl:
    pip install -e .
    
  6. Install flash-attn:
    pip3 install flash-attn --no-build-isolation
    
  7. Installation of quality enhancement tools:
    pip install wandb IPython matplotlib
    

Functional operation flow

countdown task

  1. Data preparation:
    conda activate zero
    python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}
    
  2. Training process:
    conda activate zero
    export N_GPUS=1
    export BASE_MODEL={path_to_your_model}
    export DATA_DIR={path_to_your_dataset}
    export ROLLOUT_TP_SIZE=1
    export EXPERIMENT_NAME=countdown-qwen2.5-0.5b
    export VLLM_ATTENTION_BACKEND=XFORMERS
    bash ./scripts/train_tiny_zero.sh
    

3B+ Model Training

  1. Data preparation:
    conda activate zero
    python examples/data_preprocess/countdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}
    
  2. Training process:
    conda activate zero
    export N_GPUS=2
    export BASE_MODEL={path_to_your_model}
    export DATA_DIR={path_to_your_dataset}
    export ROLLOUT_TP_SIZE=2
    export EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct
    export VLLM_ATTENTION_BACKEND=XFORMERS
    bash ./scripts/train_tiny_zero.sh
    

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Scan the code to follow

qrcode

Contact Us

Top

en_USEnglish