Anthonyg5005 commited on
Commit
c852986
·
1 Parent(s): 52b021c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md CHANGED
@@ -1,3 +1,91 @@
1
  ---
2
  license: llama2
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: llama2
3
  ---
4
+ # CTranslate2 int8 version of WizardLM-13B-V1.2
5
+
6
+ This is a int8_float16 quantization of [WizardLM-13B-V1.2](https://huggingface.co/WizardLM/WizardLM-13B)\
7
+ See more on CTranslate2: [Docs](https://opennmt.net/CTranslate2/index.html) | [Github](https://github.com/OpenNMT/CTranslate2)
8
+
9
+ This model was converted to ct2 format using the followig commnd:
10
+ ```
11
+ ct2-transformers-converter --model WizardLM/WizardLM-13B-V1.2 --copy_files tokenizer.model --output_dir wizard13b --quantization int8_float16 --low_cpu_mem_usage
12
+ ```
13
+
14
+ To convert this model, edits had to be made to the file: **added_tokens.json**
15
+
16
+ From:
17
+ ```
18
+ {
19
+ "<pad>": 32000
20
+ }
21
+ ```
22
+ To:
23
+ ```
24
+ {
25
+ }
26
+ ```
27
+
28
+ ***no converstion needed using the model from this repository as it is already in ct2 format.***
29
+
30
+ ## From the CTranslate2 GitHub (no relation to this model):
31
+
32
+ CTranslate2 is a C++ and Python library for efficient inference with Transformer models.
33
+
34
+ We translate the En->De test set *newstest2014* with multiple models:
35
+
36
+ * [OpenNMT-tf WMT14](https://opennmt.net/Models-tf/#translation): a base Transformer trained with OpenNMT-tf on the WMT14 dataset (4.5M lines)
37
+ * [OpenNMT-py WMT14](https://opennmt.net/Models-py/#translation): a base Transformer trained with OpenNMT-py on the WMT14 dataset (4.5M lines)
38
+ * [OPUS-MT](https://github.com/Helsinki-NLP/OPUS-MT-train/tree/master/models/en-de#opus-2020-02-26zip): a base Transformer trained with Marian on all OPUS data available on 2020-02-26 (81.9M lines)
39
+
40
+ The benchmark reports the number of target tokens generated per second (higher is better). The results are aggregated over multiple runs. See the [benchmark scripts](tools/benchmark) for more details and reproduce these numbers.
41
+
42
+ **Please note that the results presented below are only valid for the configuration used during this benchmark: absolute and relative performance may change with different settings.**
43
+
44
+ #### CPU
45
+
46
+ | | Tokens per second | Max. memory | BLEU |
47
+ | --- | --- | --- | --- |
48
+ | **OpenNMT-tf WMT14 model** | | | |
49
+ | OpenNMT-tf 2.31.0 (with TensorFlow 2.11.0) | 209.2 | 2653MB | 26.93 |
50
+ | **OpenNMT-py WMT14 model** | | | |
51
+ | OpenNMT-py 3.0.4 (with PyTorch 1.13.1) | 275.8 | 2012MB | 26.77 |
52
+ | - int8 | 323.3 | 1359MB | 26.72 |
53
+ | CTranslate2 3.6.0 | 658.8 | 849MB | 26.77 |
54
+ | - int16 | 733.0 | 672MB | 26.82 |
55
+ | - int8 | 860.2 | 529MB | 26.78 |
56
+ | - int8 + vmap | 1126.2 | 598MB | 26.64 |
57
+ | **OPUS-MT model** | | | |
58
+ | Transformers 4.26.1 (with PyTorch 1.13.1) | 147.3 | 2332MB | 27.90 |
59
+ | Marian 1.11.0 | 344.5 | 7605MB | 27.93 |
60
+ | - int16 | 330.2 | 5901MB | 27.65 |
61
+ | - int8 | 355.8 | 4763MB | 27.27 |
62
+ | CTranslate2 3.6.0 | 525.0 | 721MB | 27.92 |
63
+ | - int16 | 596.1 | 660MB | 27.53 |
64
+ | - int8 | 696.1 | 516MB | 27.65 |
65
+
66
+ Executed with 4 threads on a [*c5.2xlarge*](https://aws.amazon.com/ec2/instance-types/c5/) Amazon EC2 instance equipped with an Intel(R) Xeon(R) Platinum 8275CL CPU.
67
+
68
+ #### GPU
69
+
70
+ | | Tokens per second | Max. GPU memory | Max. CPU memory | BLEU |
71
+ | --- | --- | --- | --- | --- |
72
+ | **OpenNMT-tf WMT14 model** | | | | |
73
+ | OpenNMT-tf 2.31.0 (with TensorFlow 2.11.0) | 1483.5 | 3031MB | 3122MB | 26.94 |
74
+ | **OpenNMT-py WMT14 model** | | | | |
75
+ | OpenNMT-py 3.0.4 (with PyTorch 1.13.1) | 1795.2 | 2973MB | 3099MB | 26.77 |
76
+ | FasterTransformer 5.3 | 6979.0 | 2402MB | 1131MB | 26.77 |
77
+ | - float16 | 8592.5 | 1360MB | 1135MB | 26.80 |
78
+ | CTranslate2 3.6.0 | 6634.7 | 1261MB | 953MB | 26.77 |
79
+ | - int8 | 8567.2 | 1005MB | 807MB | 26.85 |
80
+ | - float16 | 10990.7 | 941MB | 807MB | 26.77 |
81
+ | - int8 + float16 | 8725.4 | 813MB | 800MB | 26.83 |
82
+ | **OPUS-MT model** | | | | |
83
+ | Transformers 4.26.1 (with PyTorch 1.13.1) | 1022.9 | 4097MB | 2109MB | 27.90 |
84
+ | Marian 1.11.0 | 3241.0 | 3381MB | 2156MB | 27.92 |
85
+ | - float16 | 3962.4 | 3239MB | 1976MB | 27.94 |
86
+ | CTranslate2 3.6.0 | 5876.4 | 1197MB | 754MB | 27.92 |
87
+ | - int8 | 7521.9 | 1005MB | 792MB | 27.79 |
88
+ | - float16 | 9296.7 | 909MB | 814MB | 27.90 |
89
+ | - int8 + float16 | 8362.7 | 813MB | 766MB | 27.90 |
90
+
91
+ Executed with CUDA 11 on a [*g5.xlarge*](https://aws.amazon.com/ec2/instance-types/g5/) Amazon EC2 instance equipped with a NVIDIA A10G GPU (driver version: 510.47.03).