File size: 3,681 Bytes
3e37530 51c96d0 3e37530 51c96d0 3e37530 b68e373 3e37530 51c96d0 3e37530 51c96d0 3e37530 51c96d0 3e37530 51c96d0 3e37530 51c96d0 3e37530 51c96d0 3e37530 272faba 915111f 3e37530 51c96d0 3e37530 51c96d0 3e37530 51c96d0 3e37530 51c96d0 3e37530 51c96d0 3e37530 51c96d0 3e37530 51c96d0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
---
license: apache-2.0
library_name: transformers
base_model: BSC-LT/salamandra-7b-instruct
pipeline_tag: text-generation
language:
- bg
- ca
- code
- cs
- cy
- da
- de
- el
- en
- es
- et
- eu
- fi
- fr
- ga
- gl
- hr
- hu
- it
- lt
- lv
- mt
- nl
- nn
- \no
- oc
- pl
- pt
- ro
- ru
- sh
- sk
- sl
- sr
- sv
- uk
---
![image/png](https://cdn-uploads.huggingface.co/production/uploads/633b489acbdbadd99c0b75ef/0AxppoCn6DIgZj6jp7feW.png)
# Salamandra-7b-instruct-gptq Model Card
This model is the gptq-quantized version of [Salamandra-7b-instruct](https://huggingface.co/BSC-LT/salamandra-7b-instruct) for speculative decoding.
The model weights are quantized from FP16 to W4A16 (4-bit weights and FP16 activations) using the [GPTQ](https://arxiv.org/abs/2210.17323) algorithm.
Inferencing with this model can be done using [VLLM](https://docs.vllm.ai/en/stable/models/engine_args.html).
Salamandra is a highly multilingual model pre-trained from scratch that comes in three different
sizes — 2B, 7B and 40B parameters — with their respective base and instruction-tuned variants,
promoted and financed by the Government of Catalonia through the [Aina Project](https://projecteaina.cat/)
and the _Ministerio para la Transformación Digital y de la Función Pública_ - Funded by EU – NextGenerationEU
within the framework of [ILENIA Project](https://proyectoilenia.es/) with reference 2022/TL22/00215337.
This model card corresponds to the gptq-quantized version of Salamandra-7b-instruct for speculative decoding.
The entire Salamandra family is released under a permissive [Apache 2.0 license]((https://www.apache.org/licenses/LICENSE-2.0)).
## How to Use
The following example code works under ``Python 3.9.16``, ``vllm==0.6.3.post1``, ``torch==2.4.0`` and ``torchvision==0.19.0``, though it should run on
any current version of the libraries. This is an example of a conversational chatbot using the model:
```
from vllm import LLM, SamplingParams
model_name = "BSC-LT/salamandra-7b-instruct-gptq"
llm = LLM(model=model_name)
messages = []
while True:
user_input = input("user >> ")
if user_input.lower() == "exit":
print("Chat ended.")
break
messages.append({'role': 'user', 'content': user_input})
outputs = llm.chat(messages,
sampling_params=SamplingParams(
temperature=0.5,
stop_token_ids=[5],
max_tokens=200)
)[0].outputs
model_output = outputs[0].text
print(f'assistant >> {model_output}')
messages.append({'role': 'assistant', 'content': model_output})
```
### Author
International Business Machines (IBM).
### Copyright
International Business Machines (IBM).
### Contact
For further information, please send an email to <[email protected]>.
### Acknowledgements
We appreciate the collaboration with IBM in this work.
Specifically, the IBM team created gptq-quantized version of the Salamandra-7b-instruct model for speculative decoding released here.
### Disclaimer
Be aware that the model may contain biases or other unintended distortions.
When third parties deploy systems or provide services based on this model, or use the model themselves,
they bear the responsibility for mitigating any associated risks and ensuring compliance with applicable
regulations, including those governing the use of Artificial Intelligence.
Barcelona Supercomputing Center and International Business Machines shall
not be held liable for any outcomes resulting from third-party use.
### License
[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0) |