hamishivi commited on
Commit
bf0c1c8
·
verified ·
1 Parent(s): e85f016

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +110 -185
README.md CHANGED
@@ -1,215 +1,140 @@
1
  ---
2
- language: en
3
- model-index:
4
- - name: allenai/open_instruct_dev
5
- results:
6
- - task:
7
- type: preference_evaluation
8
- dataset:
9
- name: reward-bench
10
- type: allenai/reward-bench
11
- metrics:
12
- - type: accuracy
13
- value: 1.0
14
- - type: accuracy
15
- value: 1.0
16
- - type: accuracy
17
- value: 1.0
18
- - type: accuracy
19
- value: 1.0
20
  ---
21
 
22
- # Model Card for allenai/open_instruct_dev
23
 
24
- <!-- Provide a quick summary of what the model is/does. -->
25
 
 
 
 
 
26
 
 
27
 
28
- ## Model Details
 
 
29
 
30
- ### Model Description
31
 
32
- <!-- Provide a longer summary of what this model is. -->
 
 
 
 
 
 
33
 
34
 
35
 
36
- - **Developed by:** [More Information Needed]
37
- - **Funded by [optional]:** [More Information Needed]
38
- - **Shared by [optional]:** [More Information Needed]
39
- - **Model type:** [More Information Needed]
40
- - **Language(s) (NLP):** en
41
- - **License:** [More Information Needed]
42
- - **Finetuned from model [optional]:** [More Information Needed]
43
 
44
- ### Model Sources [optional]
 
 
 
45
 
46
- <!-- Provide the basic links for the model. -->
47
 
48
- - **Repository:** [More Information Needed]
49
- - **Paper [optional]:** [More Information Needed]
50
- - **Demo [optional]:** [More Information Needed]
 
 
 
 
51
 
52
- ## Uses
53
 
54
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
55
 
56
- ### Direct Use
 
 
57
 
58
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
59
 
60
- [More Information Needed]
61
 
62
- ### Downstream Use [optional]
 
 
 
 
 
 
 
 
 
 
 
63
 
64
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
65
 
66
- [More Information Needed]
 
 
 
 
67
 
68
- ### Out-of-Scope Use
69
 
70
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
71
 
72
- [More Information Needed]
73
 
74
- ## Bias, Risks, and Limitations
75
 
76
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
77
 
78
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
- ### Recommendations
81
-
82
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
83
-
84
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
85
-
86
- ## How to Get Started with the Model
87
-
88
- Use the code below to get started with the model.
89
-
90
- [More Information Needed]
91
-
92
- ## Training Details
93
-
94
- ### Training Data
95
-
96
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
97
-
98
- [More Information Needed]
99
-
100
- ### Training Procedure
101
-
102
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
103
-
104
- #### Preprocessing [optional]
105
-
106
- [More Information Needed]
107
-
108
-
109
- #### Training Hyperparameters
110
-
111
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
112
-
113
- #### Speeds, Sizes, Times [optional]
114
-
115
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
116
-
117
- [More Information Needed]
118
-
119
- ## Evaluation
120
-
121
- <!-- This section describes the evaluation protocols and provides the results. -->
122
-
123
- ### Testing Data, Factors & Metrics
124
-
125
- #### Testing Data
126
-
127
- <!-- This should link to a Dataset Card if possible. -->
128
-
129
- [More Information Needed]
130
-
131
- #### Factors
132
-
133
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
134
-
135
- [More Information Needed]
136
-
137
- #### Metrics
138
-
139
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
140
-
141
- [More Information Needed]
142
-
143
- ### Results
144
-
145
- [More Information Needed]
146
-
147
- #### Summary
148
-
149
-
150
-
151
- ## Model Examination [optional]
152
-
153
- <!-- Relevant interpretability work for the model goes here -->
154
-
155
- [More Information Needed]
156
-
157
- ## Environmental Impact
158
-
159
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
160
-
161
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
162
-
163
- - **Hardware Type:** [More Information Needed]
164
- - **Hours used:** [More Information Needed]
165
- - **Cloud Provider:** [More Information Needed]
166
- - **Compute Region:** [More Information Needed]
167
- - **Carbon Emitted:** [More Information Needed]
168
-
169
- ## Technical Specifications [optional]
170
-
171
- ### Model Architecture and Objective
172
-
173
- [More Information Needed]
174
-
175
- ### Compute Infrastructure
176
-
177
- [More Information Needed]
178
-
179
- #### Hardware
180
-
181
- [More Information Needed]
182
-
183
- #### Software
184
-
185
- [More Information Needed]
186
-
187
- ## Citation [optional]
188
-
189
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
190
-
191
- **BibTeX:**
192
-
193
- [More Information Needed]
194
-
195
- **APA:**
196
-
197
- [More Information Needed]
198
-
199
- ## Glossary [optional]
200
-
201
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
202
-
203
- [More Information Needed]
204
-
205
- ## More Information [optional]
206
-
207
- [More Information Needed]
208
-
209
- ## Model Card Authors [optional]
210
-
211
- [More Information Needed]
212
-
213
- ## Model Card Contact
214
-
215
- [More Information Needed]
 
1
  ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ base_model:
7
+ - allenai/OLMo-2-1124-7B-SFT
8
+ library_name: transformers
9
+ datasets:
10
+ - allenai/tulu-3-sft-olmo-2-mixture
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
+ <img alt="OLMo Logo" src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/olmo2/olmo.png" width="242px">
14
 
15
+ # OLMo-2-1124-7B-RM
16
 
17
+ OLMo 2 7B RM November 2024 is reward model trained on top of the [OLMo 2 7B SFT November 2024](https://huggingface.co/allenai/OLMo2-7B-1124-SFT) model.
18
+ It has been trained using an OLMo-specific variant of the [Tülu 3 dataset](allenai/tulu-3-sft-olmo-2-mixture) and [this preference dataset](todo).
19
+ Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
20
+ Check out the OLMo 2 paper (forthcoming) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
21
 
22
+ This reward model was used to initialize value models during RLVR training for both 7B and 13B RLVR training.
23
 
24
+ OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
25
+ These models are trained on the Dolma dataset. We are releasing all code, checkpoints, logs (coming soon), and associated training details.
26
+ The core models released in this batch include the following:
27
 
 
28
 
29
+ | **Stage** | **OLMo 2 7B** | **OLMo 2 13B** |
30
+ |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
31
+ | **Base Model** | [allenai/OLMo2-7B-1124](https://huggingface.co/allenai/OLMo2-7B-1124) | [allenai/OLMo-2-13B-1124](https://huggingface.co/allenai/OLMo-2-13B-1124) |
32
+ | **SFT** | [allenai/OLMo-2-1124-7B-SFT](https://huggingface.co/allenai/OLMo-2-1124-7B-SFT) | [allenai/OLMo-2-1124-13B-SFT](https://huggingface.co/allenai/OLMo-2-1124-13B-SFT) |
33
+ | **DPO** | [allenai/OLMo-2-1124-7B-DPO](https://huggingface.co/allenai/OLMo-2-1124-7B-DPO) | [allenai/OLMo-2-1124-13B-DPO](https://huggingface.co/allenai/OLMo-2-1124-13B-DPO) |
34
+ | **Final Models (RLVR)** | [allenai/OLMo-2-1124-7B-Instruct](https://huggingface.co/allenai/OLMo-2-1124-7B-Instruct) | [allenai/OLMo-2-1124-13B-Instruct](https://huggingface.co/allenai/OLMo-2-1124-13B-Instruct) |
35
+ | **Reward Model (RM)**| [allenai/OLMo-2-1124-7B-RM](https://huggingface.co/allenai/OLMo-2-1124-7B-RM) | (Same as 8B) |
36
 
37
 
38
 
39
+ ## Model description
 
 
 
 
 
 
40
 
41
+ - **Model type:** A model trained on a mix of publicly available, synthetic and human-created datasets.
42
+ - **Language(s) (NLP):** Primarily English
43
+ - **License:** Apache 2.0
44
+ - **Finetuned from model:** allenai/OLMo2-7B-1124-SFT
45
 
46
+ ### Model Sources
47
 
48
+ - **Project Page:** https://allenai.org/olmo
49
+ - **Repositories:**
50
+ - Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
51
+ - Evaluation code: https://github.com/allenai/olmes
52
+ - Further fine-tuning code: https://github.com/allenai/open-instruct
53
+ - **Paper:** Coming soon!
54
+ - **Demo:** https://playground.allenai.org/
55
 
56
+ ## Using the model
57
 
58
+ ### Loading with HuggingFace
59
 
60
+ To load the model with HuggingFace, use the following snippet:
61
+ ```
62
+ from transformers import AutoModelForSequenceClassification
63
 
64
+ olmo_model = AutoModelForSequenceClassification.from_pretrained("allenai/OLMo-2-1124-7B-RM")
65
+ ```
66
 
67
+ ### Chat template
68
 
69
+ The chat template for our models is formatted as:
70
+ ```
71
+ <|endoftext|><|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
72
+ ```
73
+ Or with new lines expanded:
74
+ ```
75
+ <|endoftext|><|user|>
76
+ How are you doing?
77
+ <|assistant|>
78
+ I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
79
+ ```
80
+ It is embedded within the tokenizer as well, for `tokenizer.apply_chat_template`.
81
 
82
+ ### System prompt
83
 
84
+ In Ai2 demos, we use this system prompt by default:
85
+ ```
86
+ You are OLMo 2, a helpful and harmless AI Assistant built by the Allen Institute for AI.
87
+ ```
88
+ The model has not been trained with a specific system prompt in mind.
89
 
90
+ ### Bias, Risks, and Limitations
91
 
92
+ The OLMo 2 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
93
+ See the Falcon 180B model card for an example of this.
94
 
 
95
 
96
+ ## Performance
97
 
98
+ Note we did not benchmark the RM since it is just used for initialization during RLVR training.
99
+ We provide the results of the OLMo-2 models below:
100
 
101
+ | Model | Average | AlpacaEval | BBH | DROP | GSM8k | IFEval | MATH | MMLU | Safety | PopQA | TruthQA |
102
+ |-------|---------|------------|-----|------|--------|---------|------|-------|---------|-------|---------|
103
+ | **Open weights models** |
104
+ | Gemma-2-9B-it | 51.9 | 43.7 | 2.5 | 58.8 | 79.7 | 69.9 | 29.8 | 69.1 | 75.5 | 28.3 | 61.4 |
105
+ | Ministral-8B-Instruct | 52.1 | 31.4 | 56.2 | 56.2 | 80.0 | 56.4 | 40.0 | 68.5 | 56.2 | 20.2 | 55.5 |
106
+ | Mistral-Nemo-Instruct-2407 | 51.1 | 45.8 | 56.0 | 23.6 | 81.4 | 64.5 | 31.9 | 70.0 | 52.7 | 26.9 | 57.7 |
107
+ | Qwen-2.5-7B-Instruct | 57.1 | 29.7 | 25.3 | 54.4 | 83.8 | 74.7 | 69.9 | 76.6 | 75.0 | 18.1 | 63.1 |
108
+ | Llama-3.1-8B-Instruct | 58.9 | 25.8 | 69.7 | 61.7 | 83.4 | 80.6 | 42.5 | 71.3 | 70.2 | 28.4 | 55.1 |
109
+ | Tülu 3 8B | 60.4 | 34.0 | 66.0 | 62.6 | 87.6 | 82.4 | 43.7 | 68.2 | 75.4 | 29.1 | 55.0 |
110
+ | Qwen-2.5-14B-Instruct | 61.0 | 34.6 | 35.4 | 50.5 | 83.9 | 82.4 | 70.6 | 81.1 | 79.3 | 21.1 | 70.8 |
111
+ | **Fully open models** |
112
+ | OLMo-7B-Instruct | 28.2 | 5.2 | 35.3 | 30.7 | 14.3 | 32.2 | 2.1 | 46.3 | 54.0 | 17.1 | 44.5 |
113
+ | OLMo-7B-0424-Instruct | 33.2 | 8.5 | 35.2 | 47.9 | 23.2 | 39.2 | 5.2 | 48.9 | 49.3 | 18.9 | 55.2 |
114
+ | OLMoE-1B-7B-0924-Instruct | 35.5 | 8.5 | 37.2 | 34.3 | 47.2 | 46.2 | 8.4 | 51.6 | 51.6 | 20.6 | 49.1 |
115
+ | MAP-Neo-7B-Instruct | 42.9 | 17.6 | 26.4 | 48.2 | 69.4 | 35.9 | 31.5 | 56.5 | 73.7 | 18.4 | 51.6 |
116
+ | *OLMo-2-7B-SFT* | 50.0 | 9.3 | 50.7 | 58.2 | 71.2 | 68.0 | 25.1 | 62.0 | 82.4 | 25.0 | 47.8 |
117
+ | *OLMo-2-7B-DPO* | 55.0 | 29.9 | 47.0 | 58.8 | 82.4 | 74.5 | 31.2 | 63.4 | 81.5 | 24.5 | 57.2 |
118
+ | *OLMo-2-13B-SFT* | 55.7 | 12.0 | 58.8 | 71.8 | 75.7 | 71.5 | 31.1 | 67.3 | 82.8 | 29.3 | 56.2 |
119
+ | *OLMo-2-13B-DPO* | 61.0 | 38.3 | 58.5 | 71.9 | 84.2 | 80.6 | 35.0 | 68.5 | 80.6 | 28.9 | 63.9 |
120
+ | **OLMo-2-7B-1124–Instruct** | 55.7 | 31.0 | 48.9 | 58.9 | 85.2 | 75.6 | 31.3 | 63.9 | 81.2 | 24.6 | 56.3 |
121
+ | **OLMo-2-13B-1124-Instruct** | 61.4 | 37.5 | 58.4 | 72.1 | 87.4 | 80.4 | 39.7 | 68.6 | 77.5 | 28.8 | 63.9 |
122
 
123
+ ## Hyperparameters
124
+
125
+ RM training:
126
+ - **Learning Rate**: 3E-6
127
+ - **Effective Batch Size:** 256
128
+ - **Max. Sequence Length:** 4096
129
+ - **Learning Rate Schedule:** None
130
+ - **Num. Epochs:** 1
131
+
132
+ ## License and use
133
+
134
+ OLMo 2 is licensed under the Apache 2.0 license.
135
+ OLMo 2 is intended for research and educational use.
136
+ For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use).
137
+
138
+ ## Citation
139
+
140
+ A technical manuscript is forthcoming!