Anthonyg5005
/

WizardLM-13B-V1.2-int8-ct2

 ---
 license: llama2
 ---
+# CTranslate2 int8 version of WizardLM-13B-V1.2
+This is a int8_float16 quantization of [WizardLM-13B-V1.2](https://huggingface.co/WizardLM/WizardLM-13B)\
+See more on CTranslate2: [Docs](https://opennmt.net/CTranslate2/index.html) | [Github](https://github.com/OpenNMT/CTranslate2)
+This model was converted to ct2 format using the followig commnd:
+```
+ct2-transformers-converter --model WizardLM/WizardLM-13B-V1.2 --copy_files tokenizer.model --output_dir wizard13b --quantization int8_float16 --low_cpu_mem_usage
+```
+To convert this model, edits had to be made to the file: **added_tokens.json**
+From:
+```
+{
+  "<pad>": 32000
+}
+```
+To:
+```
+{
+}
+```
+***no converstion needed using the model from this repository as it is already in ct2 format.***
+## From the CTranslate2 GitHub (no relation to this model):
+CTranslate2 is a C++ and Python library for efficient inference with Transformer models.
+We translate the En->De test set *newstest2014* with multiple models:
+* [OpenNMT-tf WMT14](https://opennmt.net/Models-tf/#translation): a base Transformer trained with OpenNMT-tf on the WMT14 dataset (4.5M lines)
+* [OpenNMT-py WMT14](https://opennmt.net/Models-py/#translation): a base Transformer trained with OpenNMT-py on the WMT14 dataset (4.5M lines)
+* [OPUS-MT](https://github.com/Helsinki-NLP/OPUS-MT-train/tree/master/models/en-de#opus-2020-02-26zip): a base Transformer trained with Marian on all OPUS data available on 2020-02-26 (81.9M lines)
+The benchmark reports the number of target tokens generated per second (higher is better). The results are aggregated over multiple runs. See the [benchmark scripts](tools/benchmark) for more details and reproduce these numbers.
+**Please note that the results presented below are only valid for the configuration used during this benchmark: absolute and relative performance may change with different settings.**
+#### CPU
+| | Tokens per second | Max. memory | BLEU |
+| --- | --- | --- | --- |
+| **OpenNMT-tf WMT14 model** | | | |
+| OpenNMT-tf 2.31.0 (with TensorFlow 2.11.0) | 209.2 | 2653MB | 26.93 |
+| **OpenNMT-py WMT14 model** | | | |
+| OpenNMT-py 3.0.4 (with PyTorch 1.13.1) | 275.8 | 2012MB | 26.77 |
+| - int8 | 323.3 | 1359MB | 26.72 |
+| CTranslate2 3.6.0 | 658.8 | 849MB | 26.77 |
+| - int16 | 733.0 | 672MB | 26.82 |
+| - int8 | 860.2 | 529MB | 26.78 |
+| - int8 + vmap | 1126.2 | 598MB | 26.64 |
+| **OPUS-MT model** | | | |
+| Transformers 4.26.1 (with PyTorch 1.13.1) | 147.3 | 2332MB | 27.90 |
+| Marian 1.11.0 | 344.5 | 7605MB | 27.93 |
+| - int16 | 330.2 | 5901MB | 27.65 |
+| - int8 | 355.8 | 4763MB | 27.27 |
+| CTranslate2 3.6.0 | 525.0 | 721MB | 27.92 |
+| - int16 | 596.1 | 660MB | 27.53 |
+| - int8 | 696.1 | 516MB | 27.65 |
+Executed with 4 threads on a [*c5.2xlarge*](https://aws.amazon.com/ec2/instance-types/c5/) Amazon EC2 instance equipped with an Intel(R) Xeon(R) Platinum 8275CL CPU.
+#### GPU
+| | Tokens per second | Max. GPU memory | Max. CPU memory | BLEU |
+| --- | --- | --- | --- | --- |
+| **OpenNMT-tf WMT14 model** | | | | |
+| OpenNMT-tf 2.31.0 (with TensorFlow 2.11.0) | 1483.5 | 3031MB | 3122MB | 26.94 |
+| **OpenNMT-py WMT14 model** | | | | |
+| OpenNMT-py 3.0.4 (with PyTorch 1.13.1) | 1795.2 | 2973MB | 3099MB | 26.77 |
+| FasterTransformer 5.3 | 6979.0 | 2402MB | 1131MB | 26.77 |
+| - float16 | 8592.5 | 1360MB | 1135MB | 26.80 |
+| CTranslate2 3.6.0 | 6634.7 | 1261MB | 953MB | 26.77 |
+| - int8 | 8567.2 | 1005MB | 807MB | 26.85 |
+| - float16 | 10990.7 | 941MB | 807MB | 26.77 |
+| - int8 + float16 | 8725.4 | 813MB | 800MB | 26.83 |
+| **OPUS-MT model** | | | | |
+| Transformers 4.26.1 (with PyTorch 1.13.1) | 1022.9 | 4097MB | 2109MB | 27.90 |
+| Marian 1.11.0 | 3241.0 | 3381MB | 2156MB | 27.92 |
+| - float16 | 3962.4 | 3239MB | 1976MB | 27.94 |
+| CTranslate2 3.6.0 | 5876.4 | 1197MB | 754MB | 27.92 |
+| - int8 | 7521.9 | 1005MB | 792MB | 27.79 |
+| - float16 | 9296.7 | 909MB | 814MB | 27.90 |
+| - int8 + float16 | 8362.7 | 813MB | 766MB | 27.90 |
+Executed with CUDA 11 on a [*g5.xlarge*](https://aws.amazon.com/ec2/instance-types/g5/) Amazon EC2 instance equipped with a NVIDIA A10G GPU (driver version: 510.47.03).