File size: 3,994 Bytes
b07c663
826951d
 
070a394
 
826951d
 
 
 
 
 
1905d4f
1d34bb3
 
 
 
070a394
ab62897
12836ce
1d34bb3
b5d8a7f
1d34bb3
 
 
4074583
c5b0d2d
1d34bb3
593b0b3
 
1d34bb3
 
 
 
 
 
 
 
 
070a394
1d34bb3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a0d9543
 
1d34bb3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c5b0d2d
 
1d34bb3
 
 
 
 
 
 
a0d9543
1d34bb3
a0d9543
1d34bb3
 
 
f729556
41e7b5b
c5b0d2d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
tags:
- generated_from_trainer
- conversational
- polish
license: mit
language:
- pl
datasets:
- eryk-mazus/polka-dpo-v1
pipeline_tag: text-generation
inference: false
---

![image/png](https://cdn-uploads.huggingface.co/production/uploads/61bf0e11c88f3fd22f654059/FiMCITBAaEyMyxCHhfWVD.png)

# Polka-1.1B-Chat

`eryk-mazus/polka-1.1b-chat` **is the first Polish model trained to act as a helpful, conversational assistant that can be run locally.**

The model is based on [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) with the custom, extended tokenizer for more efficient Polish text generation, that was additionally pretrained on 5.7 billion tokens. **It was then fine-tuned on around 60k synthetically generated and machine-translated multi-turn conversations with the [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) performed on top of it.**

Context size: 4,096 tokens

In addition, we're releasing:
* [polka-1.1b](https://huggingface.co/eryk-mazus/polka-1.1b) - our base model with an extended tokenizer and additional pre-training on Polish corpus sampled using [DSIR](https://github.com/p-lambda/dsir)
* [polka-pretrain-en-pl-v1](https://huggingface.co/datasets/eryk-mazus/polka-pretrain-en-pl-v1) - the pre-training dataset
* [polka-dpo-v1](https://huggingface.co/datasets/eryk-mazus/polka-dpo-v1) - dataset of DPO pairs
* [polka-1.1b-chat-gguf](https://huggingface.co/eryk-mazus/polka-1.1b-chat-gguf) - GGUF files for the chat model 

## Usage

Sample code:

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

model_name = "eryk-mazus/polka-1.1b-chat"

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
    device_map="auto"
)
streamer = TextStreamer(tokenizer, skip_prompt=True)

# You are a helpful assistant.
system_prompt = "Jesteś pomocnym asystentem."
chat = [{"role": "system", "content": system_prompt}]

# Compose a short song on programming.
user_input = "Napisz krótką piosenkę o programowaniu."
chat.append({"role": "user", "content": user_input})

# Generate - add_generation_prompt to make sure it continues as assistant
inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, return_tensors="pt")
# For multi-GPU, find the device of the first parameter of the model
first_param_device = next(model.parameters()).device
inputs = inputs.to(first_param_device)

with torch.no_grad():
    outputs = model.generate(
        inputs,
        pad_token_id=tokenizer.eos_token_id,
        max_new_tokens=512,
        temperature=0.2,
        repetition_penalty=1.15,
        top_p=0.95,
        do_sample=True,
        streamer=streamer,
    )

# Add just the new tokens to our chat
new_tokens = outputs[0, inputs.size(1):]
response = tokenizer.decode(new_tokens, skip_special_tokens=True)
chat.append({"role": "assistant", "content": response})
```

The model works seamlessly with [vLLM](https://github.com/vllm-project/vllm) as well.

## Prompt format

This model uses ChatML as the prompt format:
```
<|im_start|>system
Jesteś pomocnym asystentem.
<|im_start|>user
Jakie jest dzienne zapotrzebowanie kaloryczne dorosłej osoby?<|im_end|>
<|im_start|>assistant
Dla dorosłych osób zaleca się spożywanie około 2000-3000 kcal dziennie, aby utrzymać optymalne zdrowie i dobre samopoczucie.<|im_end|>
```

This prompt is available as a [chat template](https://huggingface.co/docs/transformers/chat_templating), which means you can format messages using the `tokenizer.apply_chat_template()` method, as demonstrated in the example above.

***
We've actively looking for additional compute to train better and larger models for this project. If you want to collaborate, please reach out at: eryk.mazus at gmail dot com