Anthonyg5005
commited on
Commit
·
c852986
1
Parent(s):
52b021c
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,91 @@
|
|
1 |
---
|
2 |
license: llama2
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: llama2
|
3 |
---
|
4 |
+
# CTranslate2 int8 version of WizardLM-13B-V1.2
|
5 |
+
|
6 |
+
This is a int8_float16 quantization of [WizardLM-13B-V1.2](https://huggingface.co/WizardLM/WizardLM-13B)\
|
7 |
+
See more on CTranslate2: [Docs](https://opennmt.net/CTranslate2/index.html) | [Github](https://github.com/OpenNMT/CTranslate2)
|
8 |
+
|
9 |
+
This model was converted to ct2 format using the followig commnd:
|
10 |
+
```
|
11 |
+
ct2-transformers-converter --model WizardLM/WizardLM-13B-V1.2 --copy_files tokenizer.model --output_dir wizard13b --quantization int8_float16 --low_cpu_mem_usage
|
12 |
+
```
|
13 |
+
|
14 |
+
To convert this model, edits had to be made to the file: **added_tokens.json**
|
15 |
+
|
16 |
+
From:
|
17 |
+
```
|
18 |
+
{
|
19 |
+
"<pad>": 32000
|
20 |
+
}
|
21 |
+
```
|
22 |
+
To:
|
23 |
+
```
|
24 |
+
{
|
25 |
+
}
|
26 |
+
```
|
27 |
+
|
28 |
+
***no converstion needed using the model from this repository as it is already in ct2 format.***
|
29 |
+
|
30 |
+
## From the CTranslate2 GitHub (no relation to this model):
|
31 |
+
|
32 |
+
CTranslate2 is a C++ and Python library for efficient inference with Transformer models.
|
33 |
+
|
34 |
+
We translate the En->De test set *newstest2014* with multiple models:
|
35 |
+
|
36 |
+
* [OpenNMT-tf WMT14](https://opennmt.net/Models-tf/#translation): a base Transformer trained with OpenNMT-tf on the WMT14 dataset (4.5M lines)
|
37 |
+
* [OpenNMT-py WMT14](https://opennmt.net/Models-py/#translation): a base Transformer trained with OpenNMT-py on the WMT14 dataset (4.5M lines)
|
38 |
+
* [OPUS-MT](https://github.com/Helsinki-NLP/OPUS-MT-train/tree/master/models/en-de#opus-2020-02-26zip): a base Transformer trained with Marian on all OPUS data available on 2020-02-26 (81.9M lines)
|
39 |
+
|
40 |
+
The benchmark reports the number of target tokens generated per second (higher is better). The results are aggregated over multiple runs. See the [benchmark scripts](tools/benchmark) for more details and reproduce these numbers.
|
41 |
+
|
42 |
+
**Please note that the results presented below are only valid for the configuration used during this benchmark: absolute and relative performance may change with different settings.**
|
43 |
+
|
44 |
+
#### CPU
|
45 |
+
|
46 |
+
| | Tokens per second | Max. memory | BLEU |
|
47 |
+
| --- | --- | --- | --- |
|
48 |
+
| **OpenNMT-tf WMT14 model** | | | |
|
49 |
+
| OpenNMT-tf 2.31.0 (with TensorFlow 2.11.0) | 209.2 | 2653MB | 26.93 |
|
50 |
+
| **OpenNMT-py WMT14 model** | | | |
|
51 |
+
| OpenNMT-py 3.0.4 (with PyTorch 1.13.1) | 275.8 | 2012MB | 26.77 |
|
52 |
+
| - int8 | 323.3 | 1359MB | 26.72 |
|
53 |
+
| CTranslate2 3.6.0 | 658.8 | 849MB | 26.77 |
|
54 |
+
| - int16 | 733.0 | 672MB | 26.82 |
|
55 |
+
| - int8 | 860.2 | 529MB | 26.78 |
|
56 |
+
| - int8 + vmap | 1126.2 | 598MB | 26.64 |
|
57 |
+
| **OPUS-MT model** | | | |
|
58 |
+
| Transformers 4.26.1 (with PyTorch 1.13.1) | 147.3 | 2332MB | 27.90 |
|
59 |
+
| Marian 1.11.0 | 344.5 | 7605MB | 27.93 |
|
60 |
+
| - int16 | 330.2 | 5901MB | 27.65 |
|
61 |
+
| - int8 | 355.8 | 4763MB | 27.27 |
|
62 |
+
| CTranslate2 3.6.0 | 525.0 | 721MB | 27.92 |
|
63 |
+
| - int16 | 596.1 | 660MB | 27.53 |
|
64 |
+
| - int8 | 696.1 | 516MB | 27.65 |
|
65 |
+
|
66 |
+
Executed with 4 threads on a [*c5.2xlarge*](https://aws.amazon.com/ec2/instance-types/c5/) Amazon EC2 instance equipped with an Intel(R) Xeon(R) Platinum 8275CL CPU.
|
67 |
+
|
68 |
+
#### GPU
|
69 |
+
|
70 |
+
| | Tokens per second | Max. GPU memory | Max. CPU memory | BLEU |
|
71 |
+
| --- | --- | --- | --- | --- |
|
72 |
+
| **OpenNMT-tf WMT14 model** | | | | |
|
73 |
+
| OpenNMT-tf 2.31.0 (with TensorFlow 2.11.0) | 1483.5 | 3031MB | 3122MB | 26.94 |
|
74 |
+
| **OpenNMT-py WMT14 model** | | | | |
|
75 |
+
| OpenNMT-py 3.0.4 (with PyTorch 1.13.1) | 1795.2 | 2973MB | 3099MB | 26.77 |
|
76 |
+
| FasterTransformer 5.3 | 6979.0 | 2402MB | 1131MB | 26.77 |
|
77 |
+
| - float16 | 8592.5 | 1360MB | 1135MB | 26.80 |
|
78 |
+
| CTranslate2 3.6.0 | 6634.7 | 1261MB | 953MB | 26.77 |
|
79 |
+
| - int8 | 8567.2 | 1005MB | 807MB | 26.85 |
|
80 |
+
| - float16 | 10990.7 | 941MB | 807MB | 26.77 |
|
81 |
+
| - int8 + float16 | 8725.4 | 813MB | 800MB | 26.83 |
|
82 |
+
| **OPUS-MT model** | | | | |
|
83 |
+
| Transformers 4.26.1 (with PyTorch 1.13.1) | 1022.9 | 4097MB | 2109MB | 27.90 |
|
84 |
+
| Marian 1.11.0 | 3241.0 | 3381MB | 2156MB | 27.92 |
|
85 |
+
| - float16 | 3962.4 | 3239MB | 1976MB | 27.94 |
|
86 |
+
| CTranslate2 3.6.0 | 5876.4 | 1197MB | 754MB | 27.92 |
|
87 |
+
| - int8 | 7521.9 | 1005MB | 792MB | 27.79 |
|
88 |
+
| - float16 | 9296.7 | 909MB | 814MB | 27.90 |
|
89 |
+
| - int8 + float16 | 8362.7 | 813MB | 766MB | 27.90 |
|
90 |
+
|
91 |
+
Executed with CUDA 11 on a [*g5.xlarge*](https://aws.amazon.com/ec2/instance-types/g5/) Amazon EC2 instance equipped with a NVIDIA A10G GPU (driver version: 510.47.03).
|