Text Generation
Transformers
Safetensors
llama
text-generation-inference
Inference Endpoints
yi-01-ai commited on
Commit
e79bc3a
1 Parent(s): 94438cc

Auto Sync from git://github.com/01-ai/Yi.git/commit/6161f43679e3887a500a4f00aec3cb3090e3a23a

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -154,7 +154,7 @@ pipeline_tag: text-generation
154
  <details open>
155
  <summary>🔔 <b>2024-03-07</b>: The long text capability of the Yi-34B-200K has been enhanced. </summary>
156
  <br>
157
- In the "Needle-in-a-Haystack" test, the Yi-34B-200K's performance is improved by 10.5%, rising from 89.3% to an impressive 99.8%. We continue pretrain the model on 5B tokens long-context data mixture and demonstrates a near-all-green performance.
158
  </details>
159
 
160
  <details open>
@@ -944,13 +944,13 @@ Before deploying Yi in your environment, make sure your hardware meets the follo
944
  ##### Chat models
945
 
946
  | Model | Minimum VRAM | Recommended GPU Example |
947
- |----------------------|--------------|:-------------------------------------:|
948
- | Yi-6B-Chat | 15 GB | 1 x RTX 3090 <br> 1 x RTX 4090 <br> A10 <br> A30 |
949
- | Yi-6B-Chat-4bits | 4 GB | 1 x RTX 3060 <br> 1 x RTX 4060 |
950
- | Yi-6B-Chat-8bits | 8 GB | 1 x RTX 3070 <br> 1 x RTX 4060 |
951
- | Yi-34B-Chat | 72 GB | 4 x RTX 4090 <br> A800 (80GB) |
952
- | Yi-34B-Chat-4bits | 20 GB | 1 x RTX 3090 <br> 1 x RTX 4090 <br> A10 <br> A30 <br> A100 (40GB) |
953
- | Yi-34B-Chat-8bits | 38 GB | 2 x RTX 3090 <br> 2 x RTX 4090 <br> A800 (40GB) |
954
 
955
  Below are detailed minimum VRAM requirements under different batch use cases.
956
 
@@ -967,10 +967,10 @@ Below are detailed minimum VRAM requirements under different batch use cases.
967
 
968
  | Model | Minimum VRAM | Recommended GPU Example |
969
  |----------------------|--------------|:-------------------------------------:|
970
- | Yi-6B | 15 GB | 1 x RTX 3090 <br> 1 x RTX 4090 <br> A10 <br> A30 |
971
- | Yi-6B-200K | 50 GB | A800 (80 GB) |
972
  | Yi-9B | 20 GB | 1 x RTX 4090 (24 GB) |
973
- | Yi-34B | 72 GB | 4 x RTX 4090 <br> A800 (80 GB) |
974
  | Yi-34B-200K | 200 GB | 4 x A800 (80 GB) |
975
 
976
  <p align="right"> [
 
154
  <details open>
155
  <summary>🔔 <b>2024-03-07</b>: The long text capability of the Yi-34B-200K has been enhanced. </summary>
156
  <br>
157
+ In the "Needle-in-a-Haystack" test, the Yi-34B-200K's performance is improved by 10.5%, rising from 89.3% to an impressive 99.8%. We continue to pretrain the model on 5B tokens long-context data mixture and demonstrate a near-all-green performance.
158
  </details>
159
 
160
  <details open>
 
944
  ##### Chat models
945
 
946
  | Model | Minimum VRAM | Recommended GPU Example |
947
+ |:----------------------|:--------------|:-------------------------------------:|
948
+ | Yi-6B-Chat | 15 GB | 1 x RTX 3090 (24 GB) <br> 1 x RTX 4090 (24 GB) <br> 1 x A10 (24 GB) <br> 1 x A30 (24 GB) |
949
+ | Yi-6B-Chat-4bits | 4 GB | 1 x RTX 3060 (12 GB)<br> 1 x RTX 4060 (8 GB) |
950
+ | Yi-6B-Chat-8bits | 8 GB | 1 x RTX 3070 (8 GB) <br> 1 x RTX 4060 (8 GB) |
951
+ | Yi-34B-Chat | 72 GB | 4 x RTX 4090 (24 GB)<br> 1 x A800 (80GB) |
952
+ | Yi-34B-Chat-4bits | 20 GB | 1 x RTX 3090 (24 GB) <br> 1 x RTX 4090 (24 GB) <br> 1 x A10 (24 GB) <br> 1 x A30 (24 GB) <br> 1 x A100 (40 GB) |
953
+ | Yi-34B-Chat-8bits | 38 GB | 2 x RTX 3090 (24 GB) <br> 2 x RTX 4090 (24 GB)<br> 1 x A800 (40 GB) |
954
 
955
  Below are detailed minimum VRAM requirements under different batch use cases.
956
 
 
967
 
968
  | Model | Minimum VRAM | Recommended GPU Example |
969
  |----------------------|--------------|:-------------------------------------:|
970
+ | Yi-6B | 15 GB | 1 x RTX 3090 (24 GB) <br> 1 x RTX 4090 (24 GB) <br> 1 x A10 (24 GB) <br> 1 x A30 (24 GB) |
971
+ | Yi-6B-200K | 50 GB | 1 x A800 (80 GB) |
972
  | Yi-9B | 20 GB | 1 x RTX 4090 (24 GB) |
973
+ | Yi-34B | 72 GB | 4 x RTX 4090 (24 GB) <br> 1 x A800 (80 GB) |
974
  | Yi-34B-200K | 200 GB | 4 x A800 (80 GB) |
975
 
976
  <p align="right"> [