Mghao commited on
Commit
44ea3be
·
1 Parent(s): 08eb173

update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -27,7 +27,7 @@ We are pleased to share the initial checkpoint of our reasoning foundation large
27
 
28
  We are hopeful that applying our reinforcement learning algorithms, supported by our carefully designed infrastructure, will lead to meaningful improvements in the model’s reasoning capabilities across various domains. At the heart of the project is our data production pipeline, which we believe plays a crucial role in enabling general reasoning capabilities. We also believe that the reasoning capability induced by the data production pipline can address a range of real-world industrial scenarios with increasing precision and reliability.
29
 
30
- Based on our observations during the production of $\pi_{0}$, we have identified quality and diversity as critical factors for fostering high-quality, long Chain-of-Thought (CoT) reasoning capabilities. This insight aligns closely with conclusions drawn from the general alignment process of large language models. By meticulously designing self-verification and backtracking mechanisms to ensure process correctness in data generation, we have developed datasets that effectively induce robust long-context reasoning across diverse domains. This approach demonstrates superior performance compared to state-of-the-art o1-lile models with similar objectives, highlighting the potential of our data production pipline in advancing reasoning capabilities.
31
  ## Experiments
32
  ### Math Benchmarks
33
 
@@ -35,29 +35,29 @@ Based on our observations during the production of $\pi_{0}$, we have identified
35
  | ---------------------- | ------------ | ----- | ----- | --------------- | -------------- | ------ |
36
  | Qwen2.5-32B-Instruct | 45.71 | 72.5 | 82.82 | 46.81 | 68.83 | 23.33 |
37
  | Qwen2.5-32B-QwQ | 43.33 | 72.5 | 88.54 | 55.56 | 78.70 | 40.00 |
38
- | INF-o1 π0 | 47.27 | 85.0 | 88.60 | 56.00 | 77.14 | 40.00 |
39
  ### Logical Benchmark
40
  | Model | lsat |
41
  | ----------------- | :---: |
42
  | Qwen2.5-32B-Instruct | 33.7
43
  | Qwen2.5-32B-QwQ | 67.0 |
44
- | INF-o1 $\pi_0$ | 71.8 |
45
  ### Safety Benchmarks
46
  | Model | AIR-BENCH 2024 | AIR-BENCH 2024(CRF) |
47
  | ----------------- | :---: | :---: |
48
  | Qwen2.5-32B-Instruct | 54.29 | 53.83 |
49
  | Qwen2.5-32B-QwQ | 52.61 | 53.42 |
50
  | o1-preview | 73.25 | 70.72 |
51
- | INF-o1 $\pi_0$ | 77.25 | 74.49 |
52
  ### SQL Benchmarks
53
  | Model | bird | spider |
54
  | ----------------- | :---: | :---: |
55
  | Qwen2.5-32B-Instruct | 50.2 | 77.8 |
56
  | Qwen2.5-32B-QwQ | 43.7 | 69.9 |
57
  | o1-preview | 48.9 | 70.6 |
58
- | INF-o1 $\pi_0$ | 55.3 | 79.7 |
59
  ## Quick Start
60
- We provide an example usage of the inf-o1-$\pi_0$ below.
61
  ```python
62
  from transformers import AutoModelForCausalLM, AutoTokenizer
63
 
@@ -97,7 +97,7 @@ print(response)
97
 
98
  ```
99
  ## Future Plan
100
- Our $\pi_0$ serves as the foundation for ensuring that our data generation pipeline effectively leverages the long reasoning capabilities of large language models. Looking ahead, we plan to use $\pi_0$ as the initial policy checkpoint for reinforcement learning training. Through this process, we aim to significantly enhance the generalization of reasoning capabilities, particularly for tasks in the financial and medical domains, which are critical for both academic research and industrial applications.
101
  ## Contributor
102
  ### Supervisors
103
  Wei Chu • Yinghui Xu • Yuan Qi
@@ -106,7 +106,7 @@ Wei Chu • Yinghui Xu • Yuan Qi
106
 
107
  Chao Qu - Team Leader • Chao Wang - Infrastructure • Cheng Peng - Data Pipeline (Logical) • Dakuan Lu - Data Pipeline (Science) • Haozhe Wang - Data Pipeline (Math) & RL • Hongqing Hu - Infrastructure • Jianming Feng - Data Pipeline (Safety) • Jiaran Hao - Data Pipeline (SQL) & Infrastructure • Kelang Tian - Infrastructure • Minghao Yang - Data Pipeline (Math) • Quanbin Wang - Data Pipeline (Safety) • Renbiao Liu - Data Pipeline (SQL) • Tianchu Yao - Data Pipeline & Alignment • Weidi Xu - Data Pipeline (Logical) • Xiaoyu Tan - Data Pipeline & Alignment • Yihan Songliu - Infrastructure
108
  ## License Agreement
109
- infly-o1-$\pi_0$ support commercial applications under a permissive [License](https://huggingface.co/infly/inf-o1-pi0/blob/main/LICENSE).
110
  ## Contact
111
  Chao Qu: [email protected]
112
  Xiaoyu Tan: [email protected]
 
27
 
28
  We are hopeful that applying our reinforcement learning algorithms, supported by our carefully designed infrastructure, will lead to meaningful improvements in the model’s reasoning capabilities across various domains. At the heart of the project is our data production pipeline, which we believe plays a crucial role in enabling general reasoning capabilities. We also believe that the reasoning capability induced by the data production pipline can address a range of real-world industrial scenarios with increasing precision and reliability.
29
 
30
+ Based on our observations during the production of pi0, we have identified quality and diversity as critical factors for fostering high-quality, long Chain-of-Thought (CoT) reasoning capabilities. This insight aligns closely with conclusions drawn from the general alignment process of large language models. By meticulously designing self-verification and backtracking mechanisms to ensure process correctness in data generation, we have developed datasets that effectively induce robust long-context reasoning across diverse domains. This approach demonstrates superior performance compared to state-of-the-art o1-lile models with similar objectives, highlighting the potential of our data production pipline in advancing reasoning capabilities.
31
  ## Experiments
32
  ### Math Benchmarks
33
 
 
35
  | ---------------------- | ------------ | ----- | ----- | --------------- | -------------- | ------ |
36
  | Qwen2.5-32B-Instruct | 45.71 | 72.5 | 82.82 | 46.81 | 68.83 | 23.33 |
37
  | Qwen2.5-32B-QwQ | 43.33 | 72.5 | 88.54 | 55.56 | 78.70 | 40.00 |
38
+ | INF-o1-pi0 | 47.27 | 85.0 | 88.60 | 56.00 | 77.14 | 40.00 |
39
  ### Logical Benchmark
40
  | Model | lsat |
41
  | ----------------- | :---: |
42
  | Qwen2.5-32B-Instruct | 33.7
43
  | Qwen2.5-32B-QwQ | 67.0 |
44
+ | INF-o1-pi0 | 71.8 |
45
  ### Safety Benchmarks
46
  | Model | AIR-BENCH 2024 | AIR-BENCH 2024(CRF) |
47
  | ----------------- | :---: | :---: |
48
  | Qwen2.5-32B-Instruct | 54.29 | 53.83 |
49
  | Qwen2.5-32B-QwQ | 52.61 | 53.42 |
50
  | o1-preview | 73.25 | 70.72 |
51
+ | INF-o1-pi0 | 77.25 | 74.49 |
52
  ### SQL Benchmarks
53
  | Model | bird | spider |
54
  | ----------------- | :---: | :---: |
55
  | Qwen2.5-32B-Instruct | 50.2 | 77.8 |
56
  | Qwen2.5-32B-QwQ | 43.7 | 69.9 |
57
  | o1-preview | 48.9 | 70.6 |
58
+ | INF-o1-pi0 | 55.3 | 79.7 |
59
  ## Quick Start
60
+ We provide an example usage of the inf-o1-pi0 below.
61
  ```python
62
  from transformers import AutoModelForCausalLM, AutoTokenizer
63
 
 
97
 
98
  ```
99
  ## Future Plan
100
+ Our pi0 serves as the foundation for ensuring that our data generation pipeline effectively leverages the long reasoning capabilities of large language models. Looking ahead, we plan to use pi0 as the initial policy checkpoint for reinforcement learning training. Through this process, we aim to significantly enhance the generalization of reasoning capabilities, particularly for tasks in the financial and medical domains, which are critical for both academic research and industrial applications.
101
  ## Contributor
102
  ### Supervisors
103
  Wei Chu • Yinghui Xu • Yuan Qi
 
106
 
107
  Chao Qu - Team Leader • Chao Wang - Infrastructure • Cheng Peng - Data Pipeline (Logical) • Dakuan Lu - Data Pipeline (Science) • Haozhe Wang - Data Pipeline (Math) & RL • Hongqing Hu - Infrastructure • Jianming Feng - Data Pipeline (Safety) • Jiaran Hao - Data Pipeline (SQL) & Infrastructure • Kelang Tian - Infrastructure • Minghao Yang - Data Pipeline (Math) • Quanbin Wang - Data Pipeline (Safety) • Renbiao Liu - Data Pipeline (SQL) • Tianchu Yao - Data Pipeline & Alignment • Weidi Xu - Data Pipeline (Logical) • Xiaoyu Tan - Data Pipeline & Alignment • Yihan Songliu - Infrastructure
108
  ## License Agreement
109
+ infly-o1-pi0 support commercial applications under a permissive [License](https://huggingface.co/infly/inf-o1-pi0/blob/main/LICENSE).
110
  ## Contact
111
  Chao Qu: [email protected]
112
  Xiaoyu Tan: [email protected]