yi-01-ai commited on
Commit
d6e6371
1 Parent(s): 91e3efe

Auto Sync from git://github.com/01-ai/Yi.git/commit/9ee4974c7eff6ea8f31c690e3ba3ab19e8fe21f0

Browse files
Files changed (1) hide show
  1. README.md +478 -106
README.md CHANGED
@@ -86,8 +86,8 @@ pipeline_tag: text-generation
86
  - [Web demo](#web-demo)
87
  - [Fine tune](#fine-tune)
88
  - [Quantization](#quantization)
89
- - [Deployment](https://github.com/01-ai/Yi/blob/main/docs/deployment.md)
90
- - [Learning hub](https://github.com/01-ai/Yi/blob/main/docs/learning_hub.md)
91
  - [🟢 Why Yi?](#-why-yi)
92
  - [🌎 Ecosystem](#-ecosystem)
93
  - [💦 Upstream](#-upstream)
@@ -99,7 +99,6 @@ pipeline_tag: text-generation
99
  - [📌 Benchmarks](#-benchmarks)
100
  - [📊 Base model performance](#-base-model-performance)
101
  - [📊 Chat model performance](#-chat-model-performance)
102
- - [📊 Quantized chat model performance](#-quantized-chat-model-performance)
103
  - [🟢 Who can use Yi?](#-who-can-use-yi)
104
  - [🟢 Misc.](#-misc)
105
  - [Ackknowledgements](#acknowledgments)
@@ -121,8 +120,29 @@ pipeline_tag: text-generation
121
  - For English language capability, the Yi series models ranked 2nd (just behind GPT-4), outperforming other LLMs (such as LLaMA2-chat-70B, Claude 2, and ChatGPT) on the [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) in Dec 2023.
122
 
123
  - For Chinese language capability, the Yi series models landed in 2nd place (following GPT-4), surpassing other LLMs (such as Baidu ERNIE, Qwen, and Baichuan) on the [SuperCLUE](https://www.superclueai.com/) in Oct 2023.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
124
 
125
- - 🙏 (Credits to LLaMA) Thanks to the Transformer and LLaMA open-source communities, as they reducing the efforts required to build from scratch and enabling the utilization of the same tools within the AI ecosystem. If you're interested in Yi's adoption of LLaMA architecture and license usage policy, see [Yi's relation with LLaMA](https://github.com/01-ai/Yi/blob/main/docs/yi_relation_llama.md).
126
 
127
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
128
 
@@ -130,18 +150,19 @@ pipeline_tag: text-generation
130
 
131
  Yi models come in multiple sizes and cater to different use cases. You can also fine-tune Yi models to meet your specific requirements.
132
 
133
- If you want to deploy Yi models, see [software and hardware requirements](https://github.com/01-ai/Yi/blob/main/docs/deployment.md#hardware-requirements).
134
 
135
  ### Chat models
136
 
137
  | Model | Download
138
  |---|---
139
- Yi-6B-Chat| • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-Chat) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B-Chat/summary)
140
- Yi-6B-Chat-4bits | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-Chat-4bits) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B-Chat-4bits/summary)
141
- Yi-6B-Chat-8bits | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-Chat-8bits) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B-Chat-8bits/summary)
142
  Yi-34B-Chat | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-Chat) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-Chat/summary)
143
  Yi-34B-Chat-4bits | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-Chat-4bits) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-Chat-4bits/summary)
144
  Yi-34B-Chat-8bits | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-Chat-8bits) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-Chat-8bits/summary)
 
 
 
 
145
 
146
  <sub><sup> - 4-bit series models are quantized by AWQ. <br> - 8-bit series models are quantized by GPTQ <br> - All quantized models have a low barrier to use since they can be deployed on consumer-grade GPUs (e.g., 3090, 4090). </sup></sub>
147
 
@@ -149,10 +170,10 @@ Yi-34B-Chat-8bits | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-
149
 
150
  | Model | Download |
151
  |---|---|
152
- Yi-6B| • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B/summary)
153
- Yi-6B-200K | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-200K) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B-200K/summary)
154
  Yi-34B| • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B/summary)
155
  Yi-34B-200K|• [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-200K) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-200K/summary)
 
 
156
 
157
  <sub><sup> - 200k is roughly equivalent to 400,000 Chinese characters. </sup></sub>
158
 
@@ -172,7 +193,17 @@ Yi-34B-200K|• [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-200K)
172
 
173
  - For chat models:
174
 
175
- - For detailed chat model limitations, see [limitations of chat model](https://github.com/01-ai/Yi/blob/main/docs/README_legacy.md#limitations-of-chat-model).
 
 
 
 
 
 
 
 
 
 
176
 
177
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
178
 
@@ -249,7 +280,7 @@ Getting up and running with Yi models is simple with multiple choices available.
249
 
250
  Select one of the following paths to begin your journey with Yi!
251
 
252
- ![Quick start - Choose your path](./assets/img/quick_start_path.png)
253
 
254
  #### 🎯 Deploy Yi locally
255
 
@@ -260,7 +291,7 @@ If you prefer to deploy Yi models locally,
260
  - [Docker](https://github.com/01-ai/Yi/blob/main/docs/README_legacy.md#11-docker)
261
  - [conda-lock](https://github.com/01-ai/Yi/blob/main/docs/README_legacy.md#12-local-development-environment)
262
 
263
- - 🙋‍♀️ and you have **limited** resources (for example, a MacBook Pro), you can use [llama.cpp](https://github.com/01-ai/Yi/blob/main/docs/yi_llama.cpp.md).
264
 
265
  #### 🎯 Not to deploy Yi locally
266
 
@@ -294,7 +325,7 @@ If you want to chat with Yi with more customizable options (e.g., system prompt,
294
  - [Yi-34B-Chat](https://platform.lingyiwanwu.com/) (Yi official beta)
295
  - Access is available through a whitelist. Welcome to apply (fill out a form in [English](https://cn.mikecrm.com/l91ODJf) or [Chinese](https://cn.mikecrm.com/gnEZjiQ)).
296
 
297
- ### pip
298
 
299
  This tutorial guides you through every step of running **Yi-34B-Chat locally on an A800 (80G)** and then performing inference.
300
 
@@ -302,7 +333,7 @@ This tutorial guides you through every step of running **Yi-34B-Chat locally on
302
 
303
  - Make sure Python 3.10 or later version is installed.
304
 
305
- - If you want to run other Yi models, see [software and hardware requirements](https://github.com/01-ai/Yi/blob/main/docs/deployment.md).
306
 
307
  #### Step 1: Prepare your environment
308
 
@@ -383,7 +414,7 @@ Then you can see an output similar to the one below. 🥳
383
 
384
  <details>
385
 
386
- <summary>Output</summary>
387
 
388
  <br>
389
 
@@ -393,45 +424,167 @@ Then you can see an output similar to the one below. 🥳
393
 
394
  </details>
395
 
396
- ### Docker
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
397
 
398
- This tutorial guides you through every step of running **Yi-34B-Chat on an A800 GPU** locally and then performing inference.
399
 
400
- #### Step 0: Prerequistes
 
 
 
401
 
402
- - Make sure you've installed [Docker](https://docs.docker.com/engine/install/?open_in_browser=true) and [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
 
 
 
403
 
404
- #### Step 1: Start Docker
 
 
 
 
 
 
 
 
405
 
406
  ```bash
407
- docker run -it --gpus all \
408
- -v <your-model-path>: /models
409
- ghcr.io/01-ai/yi:latest
410
  ```
411
 
412
- Alternatively, you can pull the Yi Docker image from `registry.lingyiwanwu.com/ci/01-ai/yi:latest`.
413
 
414
- #### Step 2: Perform inference
415
 
416
- You can perform inference with Yi chat or base models as below.
417
-
418
- ##### Perform inference with Yi chat model
419
 
420
- The steps are similar to [pip - Perform inference with Yi chat model](#perform-inference-with-yi-chat-model).
 
 
 
 
421
 
422
- **Note** that the only difference is to set `model_path = '<your-model-mount-path>'` instead of `model_path = '<your-model-path>'`.
423
 
424
- ##### Perform inference with Yi base model
425
 
426
- The steps are similar to [pip - Perform inference with Yi base model](#perform-inference-with-yi-base-model).
 
 
 
 
427
 
428
- **Note** that the only difference is to set `--model <your-model-mount-path>'` instead of `model <your-model-path>`.
429
 
430
- ### Run Yi with llama.cpp
 
 
 
 
 
 
431
 
432
- If you have limited resources, you can try [llama.cpp](https://github.com/ggerganov/llama.cpp) or [ollama.cpp](https://ollama.ai/) (especially for Chinese users) to run Yi models in a few minutes locally.
 
433
 
434
- For a step-by-step tutorial, see [Run Yi with llama.cpp](https://github.com/01-ai/Yi/edit/main/docs/yi_llama.cpp.md).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
435
 
436
  ### Web demo
437
 
@@ -462,8 +615,119 @@ Once finished, you can compare the finetuned model and the base model with the f
462
  ```bash
463
  bash finetune/scripts/run_eval.sh
464
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
465
 
466
- For advanced usage (like fine-tuning based on your custom data), see [fine-tune code for Yi 6B and 34B](https://github.com/01-ai/Yi/tree/main/finetune).
 
 
467
 
468
  ### Quantization
469
 
@@ -483,7 +747,41 @@ python quantization/gptq/eval_quantized_model.py \
483
  --trust_remote_code
484
  ```
485
 
486
- For a more detailed explanation, please read the [doc](https://github.com/01-ai/Yi/tree/main/quantization/gptq)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
487
 
488
  #### AWQ
489
  ```bash
@@ -500,11 +798,118 @@ python quantization/awq/eval_quantized_model.py \
500
  --model /quantized_model \
501
  --trust_remote_code
502
  ```
 
503
 
504
- For detailed explanations, see [AWQ quantization](https://github.com/01-ai/Yi/tree/main/quantization/awq).
505
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
506
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
507
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
508
  # 🟢 Why Yi?
509
 
510
  - [🌎 Ecosystem](#-ecosystem)
@@ -515,9 +920,8 @@ For detailed explanations, see [AWQ quantization](https://github.com/01-ai/Yi/tr
515
  - [🛠️ Fine-tuning](#️-fine-tuning)
516
  - [API](#api)
517
  - [📌 Benchmarks](#-benchmarks)
518
- - [📊 Base model performance](#-base-model-performance)
519
  - [📊 Chat model performance](#-chat-model-performance)
520
- - [📊 Quantized chat model performance](#-quantized-chat-model-performance)
521
 
522
  ## 🌎 Ecosystem
523
 
@@ -600,76 +1004,46 @@ If you're seeking to explore the diverse capabilities within Yi's thriving famil
600
 
601
  ## 📌 Benchmarks
602
 
603
- - [📊 Base model performance](#-base-model-performance)
604
  - [📊 Chat model performance](#-chat-model-performance)
605
- - [📊 Quantized chat model performance](#-quantized-chat-model-performance)
606
 
607
- ### 📊 Base model performance
 
 
 
608
 
609
- | Model | MMLU | CMMLU | C-Eval | GAOKAO | BBH | Common-sense Reasoning | Reading Comprehension | Math & Code |
610
- | :------------ | :------: | :------: | :------: | :------: | :------: | :--------------------: | :-------------------: | :---------: |
611
- | | 5-shot | 5-shot | 5-shot | 0-shot | 3-shot@1 | - | - | - |
612
- | LLaMA2-34B | 62.6 | - | - | - | 44.1 | 69.9 | 68.0 | 26.0 |
613
- | LLaMA2-70B | 68.9 | 53.3 | - | 49.8 | 51.2 | 71.9 | 69.4 | 36.8 |
614
- | Baichuan2-13B | 59.2 | 62.0 | 58.1 | 54.3 | 48.8 | 64.3 | 62.4 | 23.0 |
615
- | Qwen-14B | 66.3 | 71.0 | 72.1 | 62.5 | 53.4 | 73.3 | 72.5 | **39.8** |
616
- | Skywork-13B | 62.1 | 61.8 | 60.6 | 68.1 | 41.7 | 72.4 | 61.4 | 24.9 |
617
- | InternLM-20B | 62.1 | 59.0 | 58.8 | 45.5 | 52.5 | 78.3 | - | 30.4 |
618
- | Aquila-34B | 67.8 | 71.4 | 63.1 | - | - | - | - | - |
619
- | Falcon-180B | 70.4 | 58.0 | 57.8 | 59.0 | 54.0 | 77.3 | 68.8 | 34.0 |
620
- | Yi-6B | 63.2 | 75.5 | 72.0 | 72.2 | 42.8 | 72.3 | 68.7 | 19.8 |
621
- | Yi-6B-200K | 64.0 | 75.3 | 73.5 | 73.9 | 42.0 | 72.0 | 69.1 | 19.0 |
622
- | **Yi-34B** | **76.3** | **83.7** | 81.4 | 82.8 | **54.3** | **80.1** | 76.4 | 37.1 |
623
- | Yi-34B-200K | 76.1 | 83.6 | **81.9** | **83.4** | 52.7 | 79.7 | **76.6** | 36.3 |
624
-
625
- While benchmarking open-source models, we have observed a disparity between the
626
- results generated by our pipeline and those reported in public sources (e.g.
627
- OpenCompass). Upon conducting a more in-depth investigation of this difference,
628
- we have discovered that various models may employ different prompts,
629
- post-processing strategies, and sampling techniques, potentially resulting in
630
- significant variations in the outcomes. Our prompt and post-processing strategy
631
- remains consistent with the original benchmark, and greedy decoding is employed
632
- during evaluation without any post-processing for the generated content. For
633
- scores that were not reported by the original authors (including scores reported
634
- with different settings), we try to get results with our pipeline.
635
-
636
- To evaluate the model's capability extensively, we adopted the methodology
637
- outlined in Llama2. Specifically, we included PIQA, SIQA, HellaSwag, WinoGrande,
638
- ARC, OBQA, and CSQA to assess common sense reasoning. SquAD, QuAC, and BoolQ
639
- were incorporated to evaluate reading comprehension. CSQA was exclusively tested
640
- using a 7-shot setup, while all other tests were conducted with a 0-shot
641
- configuration. Additionally, we introduced GSM8K (8-shot@1), MATH (4-shot@1),
642
- HumanEval (0-shot@1), and MBPP (3-shot@1) under the category "Math & Code". Due
643
- to technical constraints, we did not test Falcon-180 on QuAC and OBQA; the score
644
- is derived by averaging the scores on the remaining tasks. Since the scores for
645
- these two tasks are generally lower than the average, we believe that
646
- Falcon-180B's performance was not underestimated.
647
 
648
- ### 📊 Chat model performance
 
649
 
650
- | Model | MMLU | MMLU | CMMLU | CMMLU | C-Eval(val)<sup>*</sup> | C-Eval(val)<sup>*</sup> | Truthful QA | BBH | BBH | GSM8k | GSM8k |
651
- | ----------------------- | --------- | --------- | --------- | --------- | ----------------------- | ----------------------- | ----------- | --------- | --------- | --------- | --------- |
652
- | | 0-shot | 5-shot | 0-shot | 5-shot | 0-shot | 5-shot | 0-shot | 0-shot | 3-shot | 0-shot | 4-shot |
653
- | LLaMA2-13B-Chat | 50.88 | 47.33 | 27.47 | 35.08 | 27.93 | 35.88 | 36.84 | 32.90 | 58.22 | 36.85 | 2.73 |
654
- | LLaMA2-70B-Chat | 59.42 | 59.86 | 36.10 | 40.99 | 34.99 | 41.31 | 53.95 | 42.36 | 58.53 | 47.08 | 58.68 |
655
- | Baichuan2-13B-Chat | 55.09 | 50.14 | 58.64 | 59.47 | 56.02 | 54.75 | 48.98 | 38.81 | 47.15 | 45.72 | 23.28 |
656
- | Qwen-14B-Chat | 63.99 | 64.98 | 67.73 | 70.57 | 66.12 | 70.06 | 52.49 | 49.65 | 54.98 | 59.51 | 61.18 |
657
- | InternLM-Chat-20B | 55.55 | 57.42 | 53.55 | 53.75 | 51.19 | 53.57 | 51.75 | 42.41 | 36.68 | 15.69 | 43.44 |
658
- | AquilaChat2-34B v1.2 | 65.15 | 66.70 | 67.51 | 70.02 | **82.99** | **89.38** | **64.33** | 20.12 | 34.28 | 11.52 | 48.45 |
659
- | Yi-6B-Chat | 58.24 | 60.99 | 69.44 | 74.71 | 68.80 | 74.22 | 50.58 | 39.70 | 47.15 | 38.44 | 44.88 |
660
- | Yi-6B-Chat-8bits(GPTQ) | 58.29 | 60.96 | 69.21 | 74.69 | 69.17 | 73.85 | 49.85 | 40.35 | 47.26 | 39.42 | 44.88 |
661
- | Yi-6B-Chat-4bits(AWQ) | 56.78 | 59.89 | 67.70 | 73.29 | 67.53 | 72.29 | 50.29 | 37.74 | 43.62 | 35.71 | 38.36 |
662
- | Yi-34B-Chat | **67.62** | 73.46 | **79.11** | **81.34** | 77.04 | 78.53 | 62.43 | 51.41 | **71.74** | **71.65** | **75.97** |
663
- | Yi-34B-Chat-8bits(GPTQ) | 66.24 | **73.69** | 79.05 | 81.23 | 76.82 | 78.97 | 61.84 | **52.08** | 70.97 | 70.74 | 75.74 |
664
- | Yi-34B-Chat-4bits(AWQ) | 65.77 | 72.42 | 78.21 | 80.50 | 75.71 | 77.27 | 61.84 | 48.30 | 69.39 | 70.51 | 74.00 |
665
-
666
- We evaluated various benchmarks using both zero-shot and few-shot methods, except for TruthfulQA. Generally, the zero-shot approach is more common in chat models. Our evaluation strategy involves generating responses while following instructions explicitly or implicitly (such as using few-shot examples). We then isolate relevant answers from the generated text. Some models are not well-suited to produce output in the specific format required by instructions in a few datasets, which leads to suboptimal results.
667
 
668
  <strong>*</strong>: C-Eval results are evaluated on the validation datasets
 
 
 
 
 
 
 
669
 
670
- ### 📊 Quantized chat model performance
671
 
672
- We also provide both 4-bit (AWQ) and 8-bit (GPTQ) quantized Yi chat models. Evaluation results on various benchmarks have shown that the quantized models have **negligible** losses. Additionally, they reduce the memory footprint size.
 
 
 
 
 
 
 
 
 
 
673
 
674
  # 🟢 Who can use Yi?
675
 
@@ -741,9 +1115,7 @@ as well as any associated data security concerns.
741
  ### 🪪 License
742
 
743
  The source code in this repo is licensed under the [Apache 2.0
744
- license](https://github.com/01-ai/Yi/blob/main/LICENSE). The Yi series models
745
- are fully open for academic research and free commercial usage with permission
746
- via applications. All usage must adhere to the [Yi Series Models Community License Agreement 2.1](https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt).
747
  For free commercial use, you only need to send an email to [get official commercial permission](https://www.lingyiwanwu.com/yi-license).
748
 
749
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
 
86
  - [Web demo](#web-demo)
87
  - [Fine tune](#fine-tune)
88
  - [Quantization](#quantization)
89
+ - [Deployment](#deployment)
90
+ - [Learning hub](#learning-hub)
91
  - [🟢 Why Yi?](#-why-yi)
92
  - [🌎 Ecosystem](#-ecosystem)
93
  - [💦 Upstream](#-upstream)
 
99
  - [📌 Benchmarks](#-benchmarks)
100
  - [📊 Base model performance](#-base-model-performance)
101
  - [📊 Chat model performance](#-chat-model-performance)
 
102
  - [🟢 Who can use Yi?](#-who-can-use-yi)
103
  - [🟢 Misc.](#-misc)
104
  - [Ackknowledgements](#acknowledgments)
 
120
  - For English language capability, the Yi series models ranked 2nd (just behind GPT-4), outperforming other LLMs (such as LLaMA2-chat-70B, Claude 2, and ChatGPT) on the [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) in Dec 2023.
121
 
122
  - For Chinese language capability, the Yi series models landed in 2nd place (following GPT-4), surpassing other LLMs (such as Baidu ERNIE, Qwen, and Baichuan) on the [SuperCLUE](https://www.superclueai.com/) in Oct 2023.
123
+
124
+ - 🙏 (Credits to LLaMA) Thanks to the Transformer and LLaMA open-source communities, as they reducing the efforts required to build from scratch and enabling the utilization of the same tools within the AI ecosystem.
125
+ <details style="display: inline;"><summary> If you're interested in Yi's adoption of LLaMA architecture and license usage policy, see <span style="color: green;">Yi's relation with LLaMA</span> ⬇️</summary> <ul> <br>
126
+ > 💡 TL;DR
127
+ >
128
+ > The Yi series models adopt the same model architecture as LLaMA but are **NOT** derivatives of LLaMA.
129
+
130
+ - Both Yi and LLaMA are all based on the Transformer structure, which has been the standard architecture for large language models since 2018.
131
+
132
+ - Grounded in the Transformer architecture, LLaMA has become a new cornerstone for the majority of state-of-the-art open-source models due to its excellent stability, reliable convergence, and robust compatibility. This positions LLaMA as the recognized foundational framework for models including Yi.
133
+
134
+ - Thanks to the Transformer and LLaMA architectures, other models can leverage their power, reducing the effort required to build from scratch and enabling the utilization of the same tools within their ecosystems.
135
+
136
+ - However, the Yi series models are NOT derivatives of LLaMA, as they do not use LLaMA's weights.
137
+
138
+ - As LLaMA's structure is employed by the majority of open-source models, the key factors of determining model performance are training datasets, training pipelines, and training infrastructure.
139
+
140
+ - Developing in a unique and proprietary way, Yi has independently created its own high-quality training datasets, efficient training pipelines, and robust training infrastructure entirely from the ground up. This effort has led to excellent performance with Yi series models ranking just behind GPT4 and surpassing LLaMA on the [Alpaca Leaderboard in Dec 2023](https://tatsu-lab.github.io/alpaca_eval/).
141
+ </ul>
142
+ </details>
143
+
144
+
145
 
 
146
 
147
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
148
 
 
150
 
151
  Yi models come in multiple sizes and cater to different use cases. You can also fine-tune Yi models to meet your specific requirements.
152
 
153
+ If you want to deploy Yi models, see [software and hardware requirements](#deployment)
154
 
155
  ### Chat models
156
 
157
  | Model | Download
158
  |---|---
 
 
 
159
  Yi-34B-Chat | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-Chat) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-Chat/summary)
160
  Yi-34B-Chat-4bits | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-Chat-4bits) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-Chat-4bits/summary)
161
  Yi-34B-Chat-8bits | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-Chat-8bits) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-Chat-8bits/summary)
162
+ Yi-6B-Chat| • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-Chat) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B-Chat/summary)
163
+ Yi-6B-Chat-4bits | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-Chat-4bits) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B-Chat-4bits/summary)
164
+ Yi-6B-Chat-8bits | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-Chat-8bits) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B-Chat-8bits/summary)
165
+
166
 
167
  <sub><sup> - 4-bit series models are quantized by AWQ. <br> - 8-bit series models are quantized by GPTQ <br> - All quantized models have a low barrier to use since they can be deployed on consumer-grade GPUs (e.g., 3090, 4090). </sup></sub>
168
 
 
170
 
171
  | Model | Download |
172
  |---|---|
 
 
173
  Yi-34B| • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B/summary)
174
  Yi-34B-200K|• [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-200K) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-200K/summary)
175
+ Yi-6B| • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B/summary)
176
+ Yi-6B-200K | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-200K) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B-200K/summary)
177
 
178
  <sub><sup> - 200k is roughly equivalent to 400,000 Chinese characters. </sup></sub>
179
 
 
193
 
194
  - For chat models:
195
 
196
+ <details style="display: inline;"><summary>For chat model limitations, see ⬇️</summary>
197
+ <ul>
198
+ <br>The released chat model has undergone exclusive training using Supervised Fine-Tuning (SFT). Compared to other standard chat models, our model produces more diverse responses, making it suitable for various downstream tasks, such as creative scenarios. Furthermore, this diversity is expected to enhance the likelihood of generating higher quality responses, which will be advantageous for subsequent Reinforcement Learning (RL) training.
199
+
200
+ <br>However, this higher diversity might amplify certain existing issues, including:
201
+ <li>Hallucination: This refers to the model generating factually incorrect or nonsensical information. With the model's responses being more varied, there's a higher chance of hallucination that are not based on accurate data or logical reasoning.</li>
202
+ <li>Non-determinism in re-generation: When attempting to regenerate or sample responses, inconsistencies in the outcomes may occur. The increased diversity can lead to varying results even under similar input conditions.</li>
203
+ <li>Cumulative Error: This occurs when errors in the model's responses compound over time. As the model generates more diverse responses, the likelihood of small inaccuracies building up into larger errors increases, especially in complex tasks like extended reasoning, mathematical problem-solving, etc.</li>
204
+ <li>To achieve more coherent and consistent responses, it is advisable to adjust generation configuration parameters such as temperature, top_p, or top_k. These adjustments can help in the balance between creativity and coherence in the model's outputs.</li>
205
+ </ul>
206
+ </details>
207
 
208
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
209
 
 
280
 
281
  Select one of the following paths to begin your journey with Yi!
282
 
283
+ ![Quick start - Choose your path](https://github.com/01-ai/Yi/blob/main/assets/img/quick_start_path.png)
284
 
285
  #### 🎯 Deploy Yi locally
286
 
 
291
  - [Docker](https://github.com/01-ai/Yi/blob/main/docs/README_legacy.md#11-docker)
292
  - [conda-lock](https://github.com/01-ai/Yi/blob/main/docs/README_legacy.md#12-local-development-environment)
293
 
294
+ - 🙋‍♀️ and you have **limited** resources (for example, a MacBook Pro), you can use [llama.cpp](#quick-start---llamacpp)
295
 
296
  #### 🎯 Not to deploy Yi locally
297
 
 
325
  - [Yi-34B-Chat](https://platform.lingyiwanwu.com/) (Yi official beta)
326
  - Access is available through a whitelist. Welcome to apply (fill out a form in [English](https://cn.mikecrm.com/l91ODJf) or [Chinese](https://cn.mikecrm.com/gnEZjiQ)).
327
 
328
+ ### Quick start - pip
329
 
330
  This tutorial guides you through every step of running **Yi-34B-Chat locally on an A800 (80G)** and then performing inference.
331
 
 
333
 
334
  - Make sure Python 3.10 or later version is installed.
335
 
336
+ - If you want to run other Yi models, see [software and hardware requirements](#deployment)
337
 
338
  #### Step 1: Prepare your environment
339
 
 
414
 
415
  <details>
416
 
417
+ <summary>Output ⬇️ </summary>
418
 
419
  <br>
420
 
 
424
 
425
  </details>
426
 
427
+ ### Quick start - Docker
428
+ <details>
429
+ <summary> Run Yi-34B-chat locally with Docker: a step-by-step guide ⬇️</summary>
430
+ <br>This tutorial guides you through every step of running <strong>Yi-34B-Chat on an A800 GPU</strong> locally and then performing inference.
431
+ <h4>Step 0: Prerequisites</h4>
432
+ <p>Make sure you've installed <a href="https://docs.docker.com/engine/install/?open_in_browser=true">Docker</a> and <a href="https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html">nvidia-container-toolkit</a>.</p>
433
+
434
+ <h4> Step 1: Start Docker </h4>
435
+ <pre><code>docker run -it --gpus all \
436
+ -v &lt;your-model-path&gt;: /models
437
+ ghcr.io/01-ai/yi:latest
438
+ </code></pre>
439
+ <p>Alternatively, you can pull the Yi Docker image from <code>registry.lingyiwanwu.com/ci/01-ai/yi:latest</code>.</p>
440
+
441
+ <h4>Step 2: Perform inference</h4>
442
+ <p>You can perform inference with Yi chat or base models as below.</p>
443
+
444
+ <h5>Perform inference with Yi chat model</h5>
445
+ <p>The steps are similar to <a href="#perform-inference-with-yi-chat-model">pip - Perform inference with Yi chat model</a>.</p>
446
+ <p><strong>Note</strong> that the only difference is to set <code>model_path = '&lt;your-model-mount-path&gt;'</code> instead of <code>model_path = '&lt;your-model-path&gt;'</code>.</p>
447
+ <h5>Perform inference with Yi base model</h5>
448
+ <p>The steps are similar to <a href="#perform-inference-with-yi-base-model">pip - Perform inference with Yi base model</a>.</p>
449
+ <p><strong>Note</strong> that the only difference is to set <code>--model &lt;your-model-mount-path&gt;'</code> instead of <code>model &lt;your-model-path&gt;</code>.</p>
450
+ </details>
451
+
452
 
 
453
 
454
+ ### Quick start - llama.cpp
455
+ <details>
456
+ <summary> Run Yi-chat-6B-2bits locally with llama.cpp: a step-by-step guide ⬇️</summary>
457
+ <br>This tutorial guides you through every step of running a quantized model (<a href="https://huggingface.co/XeIaso/yi-chat-6B-GGUF/tree/main">Yi-chat-6B-2bits</a>) locally and then performing inference.</p>
458
 
459
+ - [Step 0: Prerequisites](#step-0-prerequisites)
460
+ - [Step 1: Download llama.cpp](#step-1-download-llamacpp)
461
+ - [Step 2: Download Yi model](#step-2-download-yi-model)
462
+ - [Step 3: Perform inference](#step-3-perform-inference)
463
 
464
+ #### Step 0: Prerequisites
465
+
466
+ - This tutorial assumes you use a MacBook Pro with 16GB of memory and an Apple M2 Pro chip.
467
+
468
+ - Make sure [`git-lfs`](https://git-lfs.com/) is installed on your machine.
469
+
470
+ #### Step 1: Download `llama.cpp`
471
+
472
+ To clone the [`llama.cpp`](https://github.com/ggerganov/llama.cpp) repository, run the following command.
473
 
474
  ```bash
475
+ git clone [email protected]:ggerganov/llama.cpp.git
 
 
476
  ```
477
 
478
+ #### Step 2: Download Yi model
479
 
480
+ 2.1 To clone [XeIaso/yi-chat-6B-GGUF](https://huggingface.co/XeIaso/yi-chat-6B-GGUF/tree/main) with just pointers, run the following command.
481
 
482
+ ```bash
483
+ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/XeIaso/yi-chat-6B-GGUF
484
+ ```
485
 
486
+ 2.2 To download a quantized Yi model ([yi-chat-6b.Q2_K.gguf](https://huggingface.co/XeIaso/yi-chat-6B-GGUF/blob/main/yi-chat-6b.Q2_K.gguf)), run the following command.
487
+
488
+ ```bash
489
+ git-lfs pull --include yi-chat-6b.Q2_K.gguf
490
+ ```
491
 
492
+ #### Step 3: Perform inference
493
 
494
+ To perform inference with the Yi model, you can use one of the following methods.
495
 
496
+ - [Method 1: Perform inference in terminal](#method-1-perform-inference-in-terminal)
497
+
498
+ - [Method 2: Perform inference in web](#method-2-perform-inference-in-web)
499
+
500
+ ##### Method 1: Perform inference in terminal
501
 
502
+ To compile `llama.cpp` using 4 threads and then conduct inference, navigate to the `llama.cpp` directory, and run the following command.
503
 
504
+ > ##### Tips
505
+ >
506
+ > - Replace `/Users/yu/yi-chat-6B-GGUF/yi-chat-6b.Q2_K.gguf` with the actual path of your model.
507
+ >
508
+ > - By default, the model operates in completion mode.
509
+ >
510
+ > - For additional output customization options (for example, system prompt, temperature, repetition penalty, etc.), run `./main -h` to check detailed descriptions and usage.
511
 
512
+ ```bash
513
+ make -j4 && ./main -m /Users/yu/yi-chat-6B-GGUF/yi-chat-6b.Q2_K.gguf -p "How do you feed your pet fox? Please answer this question in 6 simple steps:\nStep 1:" -n 384 -e
514
 
515
+ ...
516
+
517
+ How do you feed your pet fox? Please answer this question in 6 simple steps:
518
+
519
+ Step 1: Select the appropriate food for your pet fox. You should choose high-quality, balanced prey items that are suitable for their unique dietary needs. These could include live or frozen mice, rats, pigeons, or other small mammals, as well as fresh fruits and vegetables.
520
+
521
+ Step 2: Feed your pet fox once or twice a day, depending on the species and its individual preferences. Always ensure that they have access to fresh water throughout the day.
522
+
523
+ Step 3: Provide an appropriate environment for your pet fox. Ensure it has a comfortable place to rest, plenty of space to move around, and opportunities to play and exercise.
524
+
525
+ Step 4: Socialize your pet with other animals if possible. Interactions with other creatures can help them develop social skills and prevent boredom or stress.
526
+
527
+ Step 5: Regularly check for signs of illness or discomfort in your fox. Be prepared to provide veterinary care as needed, especially for common issues such as parasites, dental health problems, or infections.
528
+
529
+ Step 6: Educate yourself about the needs of your pet fox and be aware of any potential risks or concerns that could affect their well-being. Regularly consult with a veterinarian to ensure you are providing the best care.
530
+
531
+ ...
532
+
533
+ ```
534
+
535
+ Now you have successfully asked a question to the Yi model and got an answer! 🥳
536
+
537
+ ##### Method 2: Perform inference in web
538
+
539
+ 1. To initialize a lightweight and swift chatbot, navigate to the `llama.cpp` directory, and run the following command.
540
+
541
+ ```bash
542
+ ./server --ctx-size 2048 --host 0.0.0.0 --n-gpu-layers 64 --model /Users/yu/yi-chat-6B-GGUF/yi-chat-6b.Q2_K.gguf
543
+ ```
544
+
545
+ Then you can get an output like this:
546
+
547
+
548
+ ```bash
549
+ ...
550
+
551
+ llama_new_context_with_model: n_ctx = 2048
552
+ llama_new_context_with_model: freq_base = 5000000.0
553
+ llama_new_context_with_model: freq_scale = 1
554
+ ggml_metal_init: allocating
555
+ ggml_metal_init: found device: Apple M2 Pro
556
+ ggml_metal_init: picking default device: Apple M2 Pro
557
+ ggml_metal_init: ggml.metallib not found, loading from source
558
+ ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
559
+ ggml_metal_init: loading '/Users/yu/llama.cpp/ggml-metal.metal'
560
+ ggml_metal_init: GPU name: Apple M2 Pro
561
+ ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
562
+ ggml_metal_init: hasUnifiedMemory = true
563
+ ggml_metal_init: recommendedMaxWorkingSetSize = 11453.25 MB
564
+ ggml_metal_init: maxTransferRate = built-in GPU
565
+ ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 128.00 MiB, ( 2629.44 / 10922.67)
566
+ llama_new_context_with_model: KV self size = 128.00 MiB, K (f16): 64.00 MiB, V (f16): 64.00 MiB
567
+ ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 0.02 MiB, ( 2629.45 / 10922.67)
568
+ llama_build_graph: non-view tensors processed: 676/676
569
+ llama_new_context_with_model: compute buffer total size = 159.19 MiB
570
+ ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 156.02 MiB, ( 2785.45 / 10922.67)
571
+ Available slots:
572
+ -> Slot 0 - max context: 2048
573
+
574
+ llama server listening at http://0.0.0.0:8080
575
+ ```
576
+
577
+ 2. To access the chatbot interface, open your web browser and enter `http://0.0.0.0:8080` into the address bar.
578
+
579
+ ![Yi model chatbot interface - llama.cpp](https://github.com/01-ai/Yi/blob/main/assets/img/yi_llama_cpp1.png)
580
+
581
+
582
+ 3. Enter a question, such as "How do you feed your pet fox? Please answer this question in 6 simple steps" into the prompt window, and you will receive a corresponding answer.
583
+
584
+ ![Ask a question to Yi model - llama.cpp](https://github.com/01-ai/Yi/blob/main/assets/img/yi_llama_cpp2.png)
585
+
586
+ </ul>
587
+ </details>
588
 
589
  ### Web demo
590
 
 
615
  ```bash
616
  bash finetune/scripts/run_eval.sh
617
  ```
618
+ <details style="display: inline;"><summary>For advanced usage (like fine-tuning based on your custom data), see ⬇️</summary> <ul>
619
+
620
+ ### Finetune code for Yi 6B and 34B
621
+
622
+ #### Preparation
623
+
624
+ ##### From Image
625
+
626
+ By default, we use a small dataset from [BAAI/COIG](https://huggingface.co/datasets/BAAI/COIG) to finetune the base model.
627
+ You can also prepare your customized dataset in the following `jsonl` format:
628
+
629
+ ```json
630
+ { "prompt": "Human: Who are you? Assistant:", "chosen": "I'm Yi." }
631
+ ```
632
+
633
+ And then mount them in the container to replace the default ones:
634
+
635
+ ```bash
636
+ docker run -it \
637
+ -v /path/to/save/finetuned/model/:/finetuned-model \
638
+ -v /path/to/train.jsonl:/yi/finetune/data/train.json \
639
+ -v /path/to/eval.jsonl:/yi/finetune/data/eval.json \
640
+ ghcr.io/01-ai/yi:latest \
641
+ bash finetune/scripts/run_sft_Yi_6b.sh
642
+ ```
643
+
644
+ ##### From Local Server
645
+
646
+ Make sure you have conda. If not, use
647
+
648
+ ```bash
649
+ mkdir -p ~/miniconda3
650
+ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
651
+ bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
652
+ rm -rf ~/miniconda3/miniconda.sh
653
+ ~/miniconda3/bin/conda init bash
654
+ source ~/.bashrc
655
+ ```
656
+
657
+ Then, create a conda env:
658
+
659
+ ```bash
660
+ conda create -n dev_env python=3.10 -y
661
+ conda activate dev_env
662
+ pip install torch==2.0.1 deepspeed==0.10 tensorboard transformers datasets sentencepiece accelerate ray==2.7
663
+ ```
664
+
665
+ #### Hardware Setup
666
+
667
+ For the Yi-6B model, a node with 4 GPUs, each has GPU mem larger than 60GB is recommended.
668
+
669
+ For the Yi-34B model, because the usage of zero-offload technique takes a lot CPU memory, please be careful to limit the GPU numbers in 34B finetune training. Please use CUDA_VISIBLE_DEVICES to limit the GPU number (as shown in scripts/run_sft_Yi_34b.sh).
670
+
671
+ A typical hardware setup for finetuning 34B model is a node with 8GPUS (limit to 4 in running by CUDA_VISIBLE_DEVICES=0,1,2,3), each has GPU mem larger than 80GB, with total CPU mem larger than 900GB.
672
+
673
+ #### Quick Start
674
+
675
+ Download a LLM-base model to MODEL_PATH (6B and 34B). A typical folder of models is like:
676
+
677
+ ```bash
678
+ |-- $MODEL_PATH
679
+ | |-- config.json
680
+ | |-- pytorch_model-00001-of-00002.bin
681
+ | |-- pytorch_model-00002-of-00002.bin
682
+ | |-- pytorch_model.bin.index.json
683
+ | |-- tokenizer_config.json
684
+ | |-- tokenizer.model
685
+ | |-- ...
686
+ ```
687
+
688
+ Download a dataset from huggingface to local storage DATA_PATH, e.g. Dahoas/rm-static.
689
+
690
+ ```bash
691
+ |-- $DATA_PATH
692
+ | |-- data
693
+ | | |-- train-00000-of-00001-2a1df75c6bce91ab.parquet
694
+ | | |-- test-00000-of-00001-8c7c51afc6d45980.parquet
695
+ | |-- dataset_infos.json
696
+ | |-- README.md
697
+ ```
698
+
699
+ `finetune/yi_example_dataset` has example datasets, which are modified from [BAAI/COIG](https://huggingface.co/datasets/BAAI/COIG)
700
+
701
+ ```bash
702
+ |-- $DATA_PATH
703
+ |--data
704
+ |-- train.jsonl
705
+ |-- eval.jsonl
706
+ ```
707
+
708
+ `cd` into the scripts folder, copy and paste the script, and run. For example:
709
+
710
+ ```bash
711
+ cd finetune/scripts
712
+
713
+ bash run_sft_Yi_6b.sh
714
+ ```
715
+
716
+ For the Yi-6B base model, setting training_debug_steps=20 and num_train_epochs=4 can output a chat model, which takes about 20 minutes.
717
+
718
+ For the Yi-34B base model, it takes a relatively long time for initialization. Please be patient.
719
+
720
+ #### Evaluation
721
+
722
+ ```bash
723
+ cd finetune/scripts
724
+
725
+ bash run_eval.sh
726
+ ```
727
 
728
+ Then you'll see the answer from both the base model and the finetuned model
729
+ </ul>
730
+ </details>
731
 
732
  ### Quantization
733
 
 
747
  --trust_remote_code
748
  ```
749
 
750
+ <details style="display: inline;"><summary>For a more detailed explanation, see ⬇️</summary> <ul>
751
+
752
+ #### GPT-Q quantization
753
+
754
+ [GPT-Q](https://github.com/IST-DASLab/gptq) is a PTQ(Post-Training Quantization)
755
+ method. It's memory saving and provides potential speedups while retaining the accuracy
756
+ of the model.
757
+
758
+ Yi models can be GPT-Q quantized without a lot of efforts.
759
+ We provide a step-by-step tutorial below.
760
+
761
+ To run GPT-Q, we will use [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) and
762
+ [exllama](https://github.com/turboderp/exllama).
763
+ And the huggingface transformers has integrated optimum and auto-gptq to perform
764
+ GPTQ quantization on language models.
765
+
766
+ ##### Do Quantization
767
+
768
+ The `quant_autogptq.py` script is provided for you to perform GPT-Q quantization:
769
+
770
+ ```bash
771
+ python quant_autogptq.py --model /base_model \
772
+ --output_dir /quantized_model --bits 4 --group_size 128 --trust_remote_code
773
+ ```
774
+
775
+
776
+ ##### Run Quantized Model
777
+
778
+ You can run a quantized model using the `eval_quantized_model.py`:
779
+
780
+ ```bash
781
+ python eval_quantized_model.py --model /quantized_model --trust_remote_code
782
+ ```
783
+ </ul>
784
+ </details>
785
 
786
  #### AWQ
787
  ```bash
 
798
  --model /quantized_model \
799
  --trust_remote_code
800
  ```
801
+ <details style="display: inline;"><summary>For detailed explanations, see ⬇️</summary> <ul>
802
 
803
+ #### AWQ quantization
804
 
805
+ [AWQ](https://github.com/mit-han-lab/llm-awq) is a PTQ(Post-Training Quantization)
806
+ method. It's an efficient and accurate low-bit weight quantization (INT3/4) for LLMs.
807
+
808
+ Yi models can be AWQ quantized without a lot of efforts.
809
+ We provide a step-by-step tutorial below.
810
+
811
+ To run AWQ, we will use [AutoAWQ](https://github.com/casper-hansen/AutoAWQ).
812
+
813
+ ##### Do Quantization
814
+
815
+ The `quant_autoawq.py` script is provided for you to perform AWQ quantization:
816
+
817
+ ```bash
818
+ python quant_autoawq.py --model /base_model \
819
+ --output_dir /quantized_model --bits 4 --group_size 128 --trust_remote_code
820
+ ```
821
+
822
+ ##### Run Quantized Model
823
+
824
+ You can run a quantized model using the `eval_quantized_model.py`:
825
+
826
+ ```bash
827
+ python eval_quantized_model.py --model /quantized_model --trust_remote_code
828
+ ```
829
+
830
+
831
+ </ul>
832
+ </details>
833
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
834
 
835
+ ### Deployment
836
+ <details>
837
+ <summary> Software and hardware requirements of deploying Yi models ⬇️</summary>
838
+
839
+ #### Software requirements
840
+
841
+ Before using Yi quantized models, make sure you've installed the correct software listed below.
842
+
843
+ | Model | Software
844
+ |---|---
845
+ Yi 4-bit quantized models | [AWQ and CUDA](https://github.com/casper-hansen/AutoAWQ?tab=readme-ov-file#install-from-pypi)
846
+ Yi 8-bit quantized models | [GPTQ and CUDA](https://github.com/PanQiWei/AutoGPTQ?tab=readme-ov-file#quick-installation)
847
+
848
+
849
+ #### Hardware requirements
850
+
851
+ Before deploying Yi in your environment, make sure your hardware meets the following requirements.
852
+
853
+ ##### Chat models
854
+
855
+ | Model | Minimum VRAM | Recommended GPU Example |
856
+ |----------------------|--------------|:-------------------------------------:|
857
+ | Yi-6B-Chat | 15 GB | RTX 3090 <br> RTX 4090 <br> A10 <br> A30 |
858
+ | Yi-6B-Chat-4bits | 4 GB | RTX 3060 <br> RTX 4060 |
859
+ | Yi-6B-Chat-8bits | 8 GB | RTX 3070 <br> RTX 4060 |
860
+ | Yi-34B-Chat | 72 GB | 4 x RTX 4090 <br> A800 (80GB) |
861
+ | Yi-34B-Chat-4bits | 20 GB | RTX 3090 <br> RTX 4090 <br> A10 <br> A30 <br> A100 (40GB) |
862
+ | Yi-34B-Chat-8bits | 38 GB | 2 x RTX 3090 <br> 2 x RTX 4090 <br> A800 (40GB) |
863
+
864
+ Below are detailed minimum VRAM requirements under different batch use cases.
865
+
866
+ | Model | batch=1 | batch=4 | batch=16 | batch=32 |
867
+ | ----------------------- | ------- | ------- | -------- | -------- |
868
+ | Yi-6B-Chat | 12 GB | 13 GB | 15 GB | 18 GB |
869
+ | Yi-6B-Chat-4bits | 4 GB | 5 GB | 7 GB | 10 GB |
870
+ | Yi-6B-Chat-8bits | 7 GB | 8 GB | 10 GB | 14 GB |
871
+ | Yi-34B-Chat | 65 GB | 68 GB | 76 GB | > 80 GB |
872
+ | Yi-34B-Chat-4bits | 19 GB | 20 GB | 30 GB | 40 GB |
873
+ | Yi-34B-Chat-8bits | 35 GB | 37 GB | 46 GB | 58 GB |
874
+
875
+ ##### Base models
876
+
877
+ | Model | Minimum VRAM | Recommended GPU Example |
878
+ |----------------------|--------------|:-------------------------------------:|
879
+ | Yi-6B | 15 GB | RTX3090 <br> RTX4090 <br> A10 <br> A30 |
880
+ | Yi-6B-200K | 50 GB | A800 (80 GB) |
881
+ | Yi-34B | 72 GB | 4 x RTX 4090 <br> A800 (80 GB) |
882
+ | Yi-34B-200K | 200 GB | 4 x A800 (80 GB) |
883
+
884
+ </details>
885
+
886
+ ### Learning hub
887
+ <details>
888
+ <summary> Learning materials of Yi ⬇️</summary>
889
+ <br>
890
+ Welcome to the Yi learning hub!
891
+
892
+ Whether you're a seasoned developer or a newcomer, you can find a wealth of helpful educational resources to enhance your understanding and skills with Yi models, including insightful blog posts, comprehensive video tutorials, hands-on guides, and more.
893
+
894
+ The content you find here has been generously contributed by knowledgeable Yi experts and passionate enthusiasts. We extend our heartfelt gratitude for your invaluable contributions!
895
+
896
+ At the same time, we also warmly invite you to join our collaborative effort by contributing to Yi. If you have already made contributions to Yi, please don't hesitate to showcase your remarkable work in the table below.
897
+
898
+ With all these resources at your fingertips, you're ready to start your exciting journey with Yi. Happy learning! 🥳
899
+
900
+ ##### Tutorials
901
+
902
+ | Type | Deliverable | Date | Author |
903
+ |-------------|--------------------------------------------------------|----------------|----------------|
904
+ | Blog | [本地运行零一万物 34B 大模型,使用 Llama.cpp & 21G 显存](https://zhuanlan.zhihu.com/p/668921042) | 2023-11-26 | [苏洋](https://github.com/soulteary) |
905
+ | Blog | [Running Yi-34B-Chat locally using LlamaEdge](https://www.secondstate.io/articles/yi-34b/) | 2023-11-30 | [Second State](https://github.com/second-state) |
906
+ | Blog | [零一万物模型折腾笔记:官方 Yi-34B 模型基础使用](https://zhuanlan.zhihu.com/p/671387298) | 2023-12-10 | [苏洋](https://github.com/soulteary) |
907
+ | Blog | [CPU 混合推理,非常见大模型量化方案:“二三五六” 位量化方案](https://zhuanlan.zhihu.com/p/671698216) | 2023-12-12 | [苏洋](https://github.com/soulteary) |
908
+ | Video | [只需 24G 显存,用 vllm 跑起来 Yi-34B 中英双语大模型](https://www.bilibili.com/video/BV17t4y1f7Ee/) | 2023-12-28 | 漆妮妮 |
909
+ | Video | [Install Yi 34B Locally - Chinese English Bilingual LLM](https://www.youtube.com/watch?v=CVQvj4Wrh4w&t=476s) | 2023-11-05 | Fahd Mirza |
910
+ </details>
911
+
912
+
913
  # 🟢 Why Yi?
914
 
915
  - [🌎 Ecosystem](#-ecosystem)
 
920
  - [🛠️ Fine-tuning](#️-fine-tuning)
921
  - [API](#api)
922
  - [📌 Benchmarks](#-benchmarks)
 
923
  - [📊 Chat model performance](#-chat-model-performance)
924
+ - [📊 Base model performance](#-base-model-performance)
925
 
926
  ## 🌎 Ecosystem
927
 
 
1004
 
1005
  ## 📌 Benchmarks
1006
 
 
1007
  - [📊 Chat model performance](#-chat-model-performance)
1008
+ - [📊 Base model performance](#-base-model-performance)
1009
 
1010
+ ### 📊 Chat model performance
1011
+ 🎯 Performance evaluation
1012
+ - Yi-34B-chat stands out, doing better than most big models in almost all tests.
1013
+ - Both Yi-34B-chat and its variant, Yi-34B-Chat-8bits (GPTQ), take the top spots in tests including MMLU, CMMLU, BBH, and GSM8k.
1014
 
1015
+ ![Chat model performance](./assets/img/benchmark_chat.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1016
 
1017
+ <details>
1018
+ <summary>🎯 Evaluation methods and challenges ⬇️ </summary>
1019
 
1020
+ - **Evaluation methods**: we evaluated various benchmarks using both zero-shot and few-shot methods, except for TruthfulQA.
1021
+ - **Zero-shot vs. few-shot**: in chat models, the zero-shot approach is more commonly employed.
1022
+ - **Evaluation strategy**: our evaluation strategy involves generating responses while following instructions explicitly or implicitly (such as using few-shot examples). We then isolate relevant answers from the generated text.
1023
+ - **Challenges faced**: some models are not well-suited to produce output in the specific format required by instructions in few datasets, which leads to suboptimal results.
 
 
 
 
 
 
 
 
 
 
 
 
 
1024
 
1025
  <strong>*</strong>: C-Eval results are evaluated on the validation datasets
1026
+ </details>
1027
+
1028
+ ### 📊 Base model performance
1029
+ 🎯 Performance evaluation
1030
+ - Yi-34B stands out as the top performer among the big models, beating others like LLaMA2-70B and Falcon-180B in most tests.
1031
+ - Yi-34B ranks first in MMLU, CMMLU, BBH, and common-sense reasoning.
1032
+ - Yi-34B-200K ranks first C-Eval, GAOKAO, and reading comprehension.
1033
 
1034
+ ![Base model performance](./assets/img/benchmark_base.png)
1035
 
1036
+ <details>
1037
+ <summary>🎯 Evaluation methods ⬇️</summary>
1038
+
1039
+ - **Disparity in Results**: while benchmarking open-source models, a disparity has been noted between results from our pipeline and those reported by public sources like OpenCompass.
1040
+ - **Investigation Findings**: a deeper investigation reveals that variations in prompts, post-processing strategies, and sampling techniques across models may lead to significant outcome differences.
1041
+ - **Uniform Benchmarking Process**: our methodology aligns with the original benchmarks—consistent prompts and post-processing strategies are used, and greedy decoding is applied during evaluations without any post-processing for the generated content.
1042
+ - **Efforts to Retrieve Unreported Scores**: for scores that were not reported by the original authors (including scores reported with different settings), we try to get results with our pipeline.
1043
+ - **Extensive Model Evaluation**: to evaluate the model’s capability extensively, we adopted the methodology outlined in Llama2. Specifically, we included PIQA, SIQA, HellaSwag, WinoGrande, ARC, OBQA, and CSQA to assess common sense reasoning. SquAD, QuAC, and BoolQ were incorporated to evaluate reading comprehension.
1044
+ - **Special Configurations**: CSQA was exclusively tested using a 7-shot setup, while all other tests were conducted with a 0-shot configuration. Additionally, we introduced GSM8K (8-shot@1), MATH (4-shot@1), HumanEval (0-shot@1), and MBPP (3-shot@1) under the category "Math & Code".
1045
+ - **Falcon-180B Caveat**: Falcon-180B was not tested on QuAC and OBQA due to technical constraints. Its performance score is an average from other tasks, and considering the generally lower scores of these two tasks, Falcon-180B's capabilities are likely not underestimated.
1046
+ </details>
1047
 
1048
  # 🟢 Who can use Yi?
1049
 
 
1115
  ### 🪪 License
1116
 
1117
  The source code in this repo is licensed under the [Apache 2.0
1118
+ license](https://github.com/01-ai/Yi/blob/main/LICENSE). The Yi series models are fully open for academic research and free for commercial use, with automatic permission granted upon application. All usage must adhere to the [Yi Series Models Community License Agreement 2.1](https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt).
 
 
1119
  For free commercial use, you only need to send an email to [get official commercial permission](https://www.lingyiwanwu.com/yi-license).
1120
 
1121
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>