alpayariyak commited on
Commit
7697655
·
1 Parent(s): fd2fdaa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -320
README.md CHANGED
@@ -26,8 +26,8 @@ pinned: false
26
  </p>
27
 
28
  <div align="center">
29
- <img src="assets/openchat.png" style="width: 45%;">
30
- <img src="assets/openchat_grok.png" style="width: 47%;">
31
  </div>
32
 
33
  - OpenChat is an innovative library of **open-source language models**, fine-tuned with [**C-RLFT**](https://arxiv.org/pdf/2309.11235.pdf) - a strategy inspired by offline reinforcement learning.
@@ -66,57 +66,6 @@ pinned: false
66
  | Open-source SOTA** | 13B-70B | 61.4 | 7.71 | 41.7 | 49.7 | 62.3 | 63.7 | 73.2 | 41.4 | 82.3 |
67
  | | | | WizardLM 70B | Orca 13B | Orca 13B | Platypus2 70B | WizardLM 70B | WizardCoder 34B | Flan-T5 11B | MetaMath 70B |
68
 
69
- <details>
70
- <summary>Evaluation details</summary>
71
- *: ChatGPT (March) results are from GPT-4 Technical Report, Chain-of-Thought Hub, and our evaluation.
72
-
73
- ^: Zephyr-β often fails to follow few-shot CoT instructions, likely because it was aligned with only chat data but not trained on few-shot data.
74
-
75
- **: Mistral and Open-source SOTA results are taken from reported results in instruction-tuned model papers and official repositories.
76
-
77
- All models are evaluated in chat mode (e.g. with the respective conversation template applied). All zero-shot benchmarks follow the same setting as in the AGIEval paper and Orca paper. CoT tasks use the same configuration as Chain-of-Thought Hub, HumanEval is evaluated with EvalPlus, and MT-bench is run using FastChat. To reproduce our results, follow the instructions below.
78
- </details>
79
-
80
- <details>
81
- <summary>Reproducing benchmarks</summary>
82
-
83
- Reasoning:
84
-
85
- Note: Please run the following commands at the base directory of this repository.
86
-
87
- ```bash
88
- python -m ochat.evaluation.run_eval --condition "GPT4 Correct" --model openchat/openchat_3.5
89
- python ochat/evaluation/view_results.py
90
- ```
91
-
92
- HumanEval:
93
-
94
- Note: Please run the following commands at the base directory of this repository.
95
-
96
- ```bash
97
- python -m ochat.evaluation.run_eval --condition "Code" --eval_sets coding --model openchat/openchat_3.5
98
- python ochat/evaluation/convert_to_evalplus.py
99
- ```
100
-
101
- Then all humaneval code samples are placed in `ochat/evaluation/evalplus_codegen`. Use the following command to evaluate an individual code sample named `samples.jsonl` using Docker as a sandbox.
102
-
103
- ```bash
104
- docker run -v $(pwd):/app ganler/evalplus:latest --dataset humaneval --samples samples.jsonl
105
- ```
106
-
107
- MT-Bench:
108
-
109
- Please first launch a local API server, then download FastChat and run the following commands.
110
-
111
- Note: Due to non-zero temperature and GPT-4 API changes over time, there might be variations in the results.
112
-
113
- ```bash
114
- cd fastchat/llm_judge
115
- python gen_api_answer.py --model openchat_3.5 --max-tokens 4096 --parallel 128 --openai-api-base http://localhost:18888/v1
116
- python gen_judgment.py --model-list openchat_3.5 --parallel 8 --mode single
117
- ```
118
-
119
- </details>
120
 
121
  ## 🎇 Comparison with [X.AI Grok](https://x.ai/)
122
 
@@ -126,273 +75,6 @@ python gen_judgment.py --model-list openchat_3.5 --parallel 8 --mode single
126
  | Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
127
  | Grok-1 | Proprietary | ? | 55.8 | 73 | 63.2 | 23.9 | 62.9 |
128
 
129
- # ⬇️ Installation
130
- > [!NOTE]
131
- > Need [`pytorch`](https://pytorch.org/get-started/locally/#start-locally) to run OpenChat
132
-
133
- ## pip
134
-
135
- ```bash
136
- pip3 install ochat
137
- ```
138
- > [!IMPORTANT]
139
- > If you are facing package compatibility issues with pip, try the conda method below or check [this issue](https://github.com/imoneoi/openchat/issues/41)
140
-
141
- ## conda
142
-
143
- ```bash
144
- conda create -y --name openchat python=3.11
145
- conda activate openchat
146
-
147
- pip3 install ochat
148
- ```
149
-
150
- ## Windows (WSL 1.x, Ubuntu-22.04)
151
-
152
- ```bash
153
- sudo apt update
154
- sudo apt install build-essential
155
-
156
- sudo apt install -y curl
157
- curl -o miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
158
- bash miniconda.sh
159
-
160
- # Restart WSL terminal if the following conda command does not work
161
-
162
- conda create -y --name openchat python=3.11
163
- conda activate openchat
164
-
165
- pip3 install ochat
166
- ```
167
-
168
- ## From source
169
-
170
- <details>
171
- <summary>Installing ochat from source</summary>
172
-
173
- ```bash
174
- git clone https://github.com/imoneoi/openchat
175
- cd openchat
176
-
177
- pip3 install --upgrade pip # enable PEP 660 support
178
- pip3 install -e .
179
- ```
180
- </details>
181
-
182
- # 🚀 Deploying API server
183
-
184
- ⚡ Our API server is ready for production use and compatible with the OpenAI API protocol. It is highly optimized with vLLM and can dynamically batch requests.
185
-
186
- 📎 Note: For 20 series or older GPUs that do not support `bfloat16`, add `--dtype float16` to the server args.
187
-
188
- ### For a single GPU (e.g. RTX 3090, 4090)
189
-
190
- ```bash
191
- python -m ochat.serving.openai_api_server --model openchat/openchat_3.5
192
- ```
193
-
194
- ### For multiple GPUs (tensor parallel)
195
-
196
- ```bash
197
- # N is the number of tensor parallel GPUs
198
- python -m ochat.serving.openai_api_server --model openchat/openchat_3.5 --engine-use-ray --worker-use-ray --tensor-parallel-size N
199
- ```
200
-
201
- use `-h` to see more settings
202
- ```bash
203
- python -m ochat.serving.openai_api_server --model openchat/openchat_3.5 -h
204
- ```
205
-
206
- <details>
207
- <summary>Deploy as online service</summary>
208
-
209
- If you want to deploy the server as an online service, you can use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. For security purposes, we recommend using an [HTTPS gateway](https://fastapi.tiangolo.com/es/deployment/concepts/#security-https) in front of the server.
210
-
211
- </details>
212
-
213
- ## Request example
214
-
215
- Once started, the server listens at `localhost:18888` for requests and is compatible with the [OpenAI ChatCompletion API specifications](https://platform.openai.com/docs/api-reference/chat).
216
-
217
- ```bash
218
- curl http://localhost:18888/v1/chat/completions \
219
- -H "Content-Type: application/json" \
220
- -d '{
221
- "model": "openchat_3.5",
222
- "messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
223
- }'
224
- ```
225
-
226
- ### Coding Mode
227
-
228
- ```bash
229
- curl http://localhost:18888/v1/chat/completions \
230
- -H "Content-Type: application/json" \
231
- -d '{
232
- "model": "openchat_3.5",
233
- "condition": "Code",
234
- "messages": [{"role": "user", "content": "Write an aesthetic TODO app using HTML5 and JS, in a single file. You should use round corners and gradients to make it more aesthetic."}]
235
- }'
236
- ```
237
-
238
- </details>
239
-
240
- # <a id="web-ui"></a> 🌐 Web UI - [OpenChat-UI](https://github.com/imoneoi/openchat-ui)
241
-
242
- After launching the API server, OpenChat provide user interface that easy to interact with. [Click here to check Web UI](https://github.com/imoneoi/openchat-ui)
243
-
244
- # 🤗 Inference with Transformers
245
-
246
- > [!WARNING]
247
- > It's recommended to use our optimized API server for deployment. Inferencing with Transformers will be slower.
248
-
249
- The default conversation template is shown below:
250
-
251
- ```
252
- GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:
253
- ```
254
-
255
- The following is coding mode template, which may improve performance on coding tasks.
256
-
257
- ```
258
- Code User: Implement quicksort using C++<|end_of_turn|>Code Assistant:
259
- ```
260
-
261
- # <a id="training"></a> 🛠️ Training
262
-
263
- The OpenChat training system utilizes padding-free training and the [Multipack Sampler](https://github.com/imoneoi/multipack_sampler), achieving a **3~10x** speedup compared to the conventional padded training.
264
-
265
- ## Choose a base model
266
-
267
- OpenChat supports Llama 2 and Mistral models. Please first choose a base model to fit your needs. Each base model has a corresponding weight repo, model type, and recommended batch size as listed below, they should be filled into `BASE_REPO`, `MODEL_TYPE`, and `BATCH_SIZE` in the following instructions.
268
-
269
- | Base Model | Size | Weights (with EOT token) | Model Type | Recommended Batch Size per GPU (8xA100 80GB) |
270
- |------------|------|-----------------------------------|-------------------------|--------------------------------------|
271
- | Mistral | 7B | `imone/Mistral_7B_with_EOT_token` | `openchat_v3.2_mistral` | 83968 |
272
- | Llama 2 | 7B | `imone/LLaMA2_7B_with_EOT_token` | `openchat_v3.2` | 83968 |
273
- | Llama 2 | 13B | `imone/Llama2_13B_with_EOT_token` | `openchat_v3.2` | 36864 |
274
-
275
- Note: The OpenChat conversation template requires an `<|end_of_turn|>` special token. The base model specified must include this token. Our provided weights are the original base weights with this token added. If you want to add them manually, use the `convert_llama_weights_to_hf_add_tokens.py` or `mistral_add_tokens.py` in the `scripts` directory.
276
-
277
- ## Installing DeepSpeed
278
-
279
- First, ensure that the CUDA `nvcc` compiler is available in your environment. If it is not, install the CUDA toolkit that matches the version used by PyTorch.
280
-
281
- Next, install DeepSpeed:
282
-
283
- ```bash
284
- pip install deepspeed
285
- ```
286
-
287
- ### Preparing Your Data
288
-
289
- To utilize the OpenChat trainer, prepare your SFT data into a JSON Lines format where each line corresponds to a `Conversation` object:
290
-
291
- ```python
292
- class Message(BaseModel):
293
- role: str # Must be "user" or "assistant"
294
- content: str # Message content
295
- weight: Optional[float] = None # Loss weight for this message. Typically 0 for user and 1 for assistant to supervise assistant's responses only
296
-
297
-
298
- class Conversation(BaseModel):
299
- items: List[Message] # All messages within the conversation
300
- condition: str = "" # C-RLFT condition, can be any string or empty.
301
- system: str = "" # System message for this conversation
302
- ```
303
-
304
- For basic SFT, assign `weight` as `0` for human messages and `1` for assistant responses.
305
-
306
- SFT example:
307
-
308
- ```json
309
- {"items":[{"role":"user","content":"Hello","weight":0.0},{"role":"assistant","content":"Hi","weight":1.0},{"role":"user","content":"How are you today?","weight":0.0},{"role":"assistant","content":"I'm fine.","weight":1.0}],"system":""}
310
- {"items":[{"role":"user","content":"Who are you?","weight":0.0},{"role":"assistant","content":"I'm OpenChat.","weight":1.0}],"system":"You are a helpful assistant named OpenChat."}
311
- ```
312
-
313
- For C-RLFT, `condition` should be set as the class the conversation belongs to (e.g. `GPT3` or `GPT4`). The `weight` is assigned as `0` for human messages and `w` for assistant responses, where `w` is the weight of the class (e.g. `0.1` for `GPT3` and `1` for `GPT4`, as found in our C-RLFT paper).
314
-
315
- C-RLFT example:
316
-
317
- ```json
318
- {"items":[{"role":"user","content":"What is C-RLFT?","weight":0.0},{"role":"assistant","content":"C-RLFT is a method for improving open-source LLMs with mixed-quality data.","weight":1.0}],"condition":"GPT4","system":""}
319
- {"items":[{"role":"user","content":"What is C-RLFT?","weight":0.0},{"role":"assistant","content":"I don't know.","weight":0.1}],"condition":"GPT3","system":""}
320
- ```
321
-
322
- ### Pre-tokenizing the Dataset
323
-
324
- You'll then need to pre-tokenize the dataset using the command (please specify a filename as `PRETOKENIZED_DATA_OUTPUT_PATH` to store the pretokenized dataset):
325
-
326
- ```bash
327
- python -m ochat.data.generate_dataset --model-type MODEL_TYPE --model-path BASE_REPO --in-files data.jsonl --out-prefix PRETOKENIZED_DATA_OUTPUT_PATH
328
- ```
329
-
330
- ### Launching the OpenChat Trainer
331
-
332
- You can now launch the OpenChat trainer using the command below.
333
- - 13B model requires eight `A/H100s` with 80GB VRAM
334
- - 7B model can be trained with four `A/H100s` with 80GB VRAM or eight `A/H100s` with 40GB VRAM.
335
-
336
- For hyperparameters, we recommend first setting the batch size to the recommended batch size. If OOM occurs, try setting it to the exact maximum that VRAM can hold and as a multiple of `2048`.
337
- Other hyperparameters have been carefully selected as the default. Furthermore, the learning rate is automatically determined based on the [inverse square-root rule](https://arxiv.org/abs/2006.09092).
338
-
339
- <details>
340
-
341
- <summary>Training Commands (click to expand)</summary>
342
-
343
- ```bash
344
- NUM_GPUS=8
345
-
346
- deepspeed --num_gpus=$NUM_GPUS --module ochat.training_deepspeed.train \
347
- --model_path BASE_REPO \
348
- --data_prefix PRETOKENIZED_DATA_OUTPUT_PATH \
349
- --save_path PATH_TO_SAVE_MODEL \
350
- --batch_max_len BATCH_SIZE \
351
- --epochs 5 \
352
- --save_every 1 \
353
- --deepspeed \
354
- --deepspeed_config ochat/training_deepspeed/deepspeed_config.json
355
- ```
356
-
357
- </details>
358
-
359
- You can find checkpoints of all epochs in `PATH_TO_SAVE_MODEL`. Then you may evaluate each epoch and choose the best one.
360
-
361
- # Limitations
362
-
363
- ## Foundation Model Limitations
364
- Despite its advanced capabilities, OpenChat is still bound by the limitations inherent in its foundation models. These limitations may impact the model's performance in areas such as:
365
-
366
- - Complex reasoning
367
- - Mathematical and arithmetic tasks
368
- - Programming and coding challenges
369
-
370
- ## Hallucination of Non-existent Information
371
- OpenChat may sometimes generate information that does not exist or is not accurate, also known as "hallucination". Users should be aware of this possibility and verify any critical information obtained the model.
372
-
373
- ## Safety
374
- OpenChat may sometimes generate harmful, hate speech, biased responses, or answer unsafe questions. It's crucial to apply additional AI safety measures in use cases that require safe and moderated responses.
375
-
376
- # License
377
-
378
- Our OpenChat 3.5 `code` and `models` are distributed under the **Apache License 2.0**.
379
-
380
- # <a id="models"></a> Models
381
-
382
- | Model | Size | Context | Weights | Serving |
383
- |--------------|------|---------|-------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
384
- | OpenChat 3.5 | 7B | 8192 | [Huggingface](https://huggingface.co/openchat/openchat_3.5) | `python -m ochat.serving.openai_api_server --model openchat/openchat_3.5 --engine-use-ray --worker-use-ray` |
385
-
386
- ## <a id="legacy-models"></a> Legacy Models
387
-
388
- The following models are older versions of OpenChat and have inferior performance compared to the latest version. They will be deprecated in the next release. Please note that OpenChat V1 and V2 series are now deprecated, [please install 3.1.x for using V1 and V2 models](https://github.com/imoneoi/openchat/tree/83a683c775c77867cc45937fafdf48e8dcb68daa)
389
-
390
- To run the models on multiple GPUs with smaller VRAM, you can enable tensor parallelization, for example, using the `--tensor-parallel-size 2` flag.
391
-
392
- | Model | Size | Context | Weights | Serving |
393
- |--------------|------|---------|--------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------|
394
- | OpenChat 3.2 SUPER | 13B | 4096 | [Huggingface](https://huggingface.co/openchat/openchat_v3.2_super) | `python -m ochat.serving.openai_api_server --model openchat/openchat_v3.2_super --engine-use-ray --worker-use-ray` |
395
-
396
  # 💌Contact
397
 
398
  We are a student team Tsinghua University, working on OpenChat, a project that requires additional computing power or LLMs API keys for further development. If you are interested in our project and would like to offer support, please feel free to reach out to us:
 
26
  </p>
27
 
28
  <div align="center">
29
+ <img src="https://github.com/imoneoi/openchat/raw/master/assets/openchat.png" style="width: 45%;">
30
+ <img src="https://github.com/imoneoi/openchat/raw/master/assets/openchat_grok.png" style="width: 47%;">
31
  </div>
32
 
33
  - OpenChat is an innovative library of **open-source language models**, fine-tuned with [**C-RLFT**](https://arxiv.org/pdf/2309.11235.pdf) - a strategy inspired by offline reinforcement learning.
 
66
  | Open-source SOTA** | 13B-70B | 61.4 | 7.71 | 41.7 | 49.7 | 62.3 | 63.7 | 73.2 | 41.4 | 82.3 |
67
  | | | | WizardLM 70B | Orca 13B | Orca 13B | Platypus2 70B | WizardLM 70B | WizardCoder 34B | Flan-T5 11B | MetaMath 70B |
68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
 
70
  ## 🎇 Comparison with [X.AI Grok](https://x.ai/)
71
 
 
75
  | Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
76
  | Grok-1 | Proprietary | ? | 55.8 | 73 | 63.2 | 23.9 | 62.9 |
77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
  # 💌Contact
79
 
80
  We are a student team Tsinghua University, working on OpenChat, a project that requires additional computing power or LLMs API keys for further development. If you are interested in our project and would like to offer support, please feel free to reach out to us: