yuhuixu commited on
Commit
2a0a8b1
·
verified ·
1 Parent(s): 131248e

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +0 -69
app.py CHANGED
@@ -47,75 +47,6 @@ training.
47
  3. 📈 Flexible scalability: Robust performance across diverse inference budgets on reasoning benchmarks like AIME and LiveCodeBench.
48
  4. ⚙️ Better performance with fewer tokens: Our trained model generates outputs that are 30% shorter while maintaining (or even improving) accuracy.
49
 
50
- <p align="center">
51
- <img src="figs/aime.png" width="46%" />
52
- <img src="figs/livecode.png" width="48%" />
53
- </p>
54
-
55
- <p align="center">
56
- <img src="figs/codetable.png" width="90%" />
57
- </p>
58
-
59
- ## Environment Setup
60
-
61
-
62
- ### Installation
63
- ```bash
64
- # Installing Python 3.10 Environment.
65
- conda create -n e1 python=3.10 -y
66
- conda activate e1
67
-
68
- # Installing dependencies.
69
- cd Elastic-Reasoning
70
- pip install -e ./verl
71
- pip install -e .
72
- ```
73
- ### Data
74
- Our raw training data is in `rllm/data/[train|test]/[code|math]/`, along with preprocessing scripts in `rllm/data/preprocess`. To convert the raw data into Parquet files for training, run:
75
-
76
- ```bash
77
- # Download datasets from GDrive, populates rllm/data/[train|test]/[math|code]/*.json
78
- python scripts/data/download_datasets.py
79
-
80
- # Generate parquet files for Deepcoder/DeepscaleR in data/*.parquet
81
- python scripts/data/[deepcoder|deepscaler]_dataset.py
82
- ```
83
- ## Training
84
- ```bash
85
- export MODEL_PATH="agentica-org/DeepScaleR-1.5B-Preview"
86
- ./scripts/e1-math/e1_math_1.5b_1k_1k.sh --model $MODEL_PATH
87
- ```
88
-
89
- ## Evaluation
90
-
91
- To run our evaluation scripts, run:
92
- ```bash
93
- ./scripts/eval/eval_model.sh --model [CHECKPOINT_PATH] --datasets [DATASET1] [DATASET2] --output-dir [OUTPUT_DIR] --n [N_PASSES] --tp [TENSOR_PARALLEL_SIZE] --e1-mode [SEPARATE_BUDGETING] --e1-thinking-length [THINKING_LENGTH] --e1-solution-length [SOLUTION_LENGTH]
94
- ```
95
-
96
- ### Example on MATH
97
- ```bash
98
- ./scripts/eval/eval_model.sh --model Salesforce/E1-Math-1.5B --datasets aime math amc minerva olympiad_bench --output-dir $HOME/E1-Math-1.5B --tp 1 --n 16 --e1-mode True --e1-thinking-length 1024 --e1-solution-length 1024
99
- ```
100
- ### Example on LiveCodeBench
101
- ```bash
102
- ./scripts/eval/eval_model.sh --model Salesforce/E1-Code-14B --datasets test_livecodebench --output-dir $HOME/E1-Code-14B --tp 4 --e1-mode True --e1-thinking-length 1024 --e1-solution-length 1024
103
- ```
104
-
105
- ### Example on Codeforces
106
- ```bash
107
- ./scripts/eval/eval_model.sh --model Salesforce/E1-Code-14B --datasets test_codeforces --output-dir $HOME/DeepCoder-14B-Preview --tp 4 --n 8 --e1-mode True --e1-thinking-length 1024 --e1-solution-length 1024
108
- ```
109
- ```bash
110
- python scripts/deepcoder/benchmark/cf_elo_calc.py --results_path [RESULTS_JSON_PATH] --pass_n 8
111
- ```
112
-
113
- ### Unconstrained evaluation
114
- set `--e1-mode False` and `--max-length [Maxmum token length, e.g. 32768]`
115
-
116
-
117
- ## Acknowledgement
118
- We greatly thanks [rllm](https://github.com/agentica-project/rllm) and [verl](https://github.com/volcengine/verl) for providing the awesome codebase!
119
 
120
  ## Citation
121
 
 
47
  3. 📈 Flexible scalability: Robust performance across diverse inference budgets on reasoning benchmarks like AIME and LiveCodeBench.
48
  4. ⚙️ Better performance with fewer tokens: Our trained model generates outputs that are 30% shorter while maintaining (or even improving) accuracy.
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  ## Citation
52