|
--- |
|
language: |
|
- pl |
|
license: apache-2.0 |
|
library_name: transformers |
|
tags: |
|
- finetuned |
|
- gguf |
|
inference: false |
|
pipeline_tag: text-generation |
|
base_model: speakleash/Bielik-11B-v2.3-Instruct |
|
--- |
|
|
|
<p align="center"> |
|
<img src="https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1/raw/main/speakleash_cyfronet.png"> |
|
</p> |
|
|
|
# Bielik-11B-v2.3-Instruct-GPTQ |
|
|
|
This repo contains OpenVino 4bit format model files for [SpeakLeash](https://speakleash.org/)'s [Bielik-11B-v.2.3-Instruct](https://huggingface.co/speakleash/Bielik-11B-v2.3-Instruct). |
|
|
|
<b><u>DISCLAIMER: Be aware that quantised models show reduced response quality and possible hallucinations!</u></b><br> |
|
### Model usage with OpenVino |
|
This model can be deployed efficiently using the [OpenVino](https://docs.openvino.ai/2024/index.html). Below you can find two ways of model inference: using Intel Optimum, pure OpenVino library. |
|
|
|
The most simple LLM inferencing code with OpenVINO and the optimum-intel library. |
|
```python |
|
from optimum.intel import OVModelForCausalLM |
|
from transformers import AutoTokenizer |
|
|
|
model_id = "speakleash/Bielik-11B-v2.3-Instruct-4bit-ov" |
|
model = OVModelForCausalLM.from_pretrained(model_id, use_cache=False) |
|
|
|
question = "Dlaczego ryby nie potrafi膮 fruwa膰?" |
|
|
|
prompt_text_bielik = f"""<s><|im_start|> system |
|
Odpowiadaj kr贸tko, precyzyjnie i wy艂膮cznie w j臋zyku polskim.<|im_end|> |
|
<|im_start|> user |
|
{question}<|im_end|> |
|
<|im_start|> assistant |
|
""" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
inputs = tokenizer(prompt_text_bielik, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_new_tokens=500) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
Run an LLM model with only OpenVINO (additionaly we provided code which uses 'greedy decoding' instead of sampling). |
|
```python |
|
import openvino as ov |
|
import numpy as np |
|
from transformers import AutoTokenizer |
|
|
|
model_path = "speakleash/Bielik-11B-v2.3-Instruct-4bit-ov/openvino_model.xml" |
|
tokenizer = AutoTokenizer.from_pretrained("speakleash/Bielik-11B-v2.3-Instruct-4bit-ov") |
|
|
|
ov_model = ov.Core().read_model(model_path) |
|
compiled_model = ov.compile_model(ov_model, "CPU") |
|
infer_request = compiled_model.create_infer_request() |
|
|
|
question = "Dlaczego ryby nie potrafi膮 fruwa膰?" |
|
prompt_text_bielik = f"""<s><|im_start|> system |
|
Odpowiadaj kr贸tko, precyzyjnie i wy艂膮cznie w j臋zyku polskim.<|im_end|> |
|
<|im_start|> user |
|
{question}<|im_end|> |
|
<|im_start|> assistant |
|
""" |
|
|
|
tokens = tokenizer.encode(prompt_text_bielik, return_tensors="np") |
|
input_ids = tokens |
|
attention_mask = np.ones_like(input_ids) |
|
position_ids = np.arange(len(tokens[0])).reshape(1, -1) |
|
beam_idx = np.array([0], dtype=np.int32) |
|
|
|
infer_request.reset_state() |
|
|
|
prev_output = '' |
|
generated_text_ids = np.array([], dtype=np.int32) |
|
num_max_token_for_generation = 500 |
|
|
|
print(f'Pytanie: {question}') |
|
print("Odpowied藕:", end=' ', flush=True) |
|
|
|
for _ in range(num_max_token_for_generation): |
|
response = infer_request.infer(inputs={ |
|
'input_ids': input_ids, |
|
'attention_mask': attention_mask, |
|
'position_ids': position_ids, |
|
'beam_idx': beam_idx |
|
}) |
|
|
|
next_token_logits = response['logits'][0, -1, :] |
|
sampled_id = np.argmax(next_token_logits) # Greedy decoding |
|
generated_text_ids = np.append(generated_text_ids, sampled_id) |
|
|
|
output_text = tokenizer.decode(generated_text_ids) |
|
print(output_text[len(prev_output):], end='', flush=True) |
|
prev_output = output_text |
|
|
|
input_ids = np.array([[sampled_id]], dtype=np.int64) |
|
attention_mask = np.array([[1]], dtype=np.int64) |
|
position_ids = np.array([[position_ids[0, -1] + 1]], dtype=np.int64) |
|
|
|
if sampled_id == tokenizer.eos_token_id: |
|
print('\n\n*** Zako艅czono generowanie.') |
|
break |
|
|
|
print(f'\n\n*** Wygenerowano {len(generated_text_ids)} token贸w.') |
|
``` |
|
|
|
### Model description: |
|
|
|
* **Developed by:** [SpeakLeash](https://speakleash.org/) & [ACK Cyfronet AGH](https://www.cyfronet.pl/) |
|
* **Language:** Polish |
|
* **Model type:** causal decoder-only |
|
* **Quant from:** [Bielik-11B-v2.3-Instruct](https://huggingface.co/speakleash/Bielik-11B-v2.3-Instruct) |
|
* **Finetuned from:** [Bielik-11B-v2](https://huggingface.co/speakleash/Bielik-11B-v2) |
|
* **License:** Apache 2.0 and [Terms of Use](https://bielik.ai/terms/) |
|
|
|
### Responsible for model quantization |
|
* [Remigiusz Kinas](https://www.linkedin.com/in/remigiusz-kinas/)<sup>SpeakLeash</sup> - team leadership, conceptualizing, calibration data preparation, process creation and quantized model delivery. |
|
|
|
## Contact Us |
|
|
|
If you have any questions or suggestions, please use the discussion tab. If you want to contact us directly, join our [Discord SpeakLeash](https://discord.gg/CPBxPce4). |
|
|