File size: 1,912 Bytes
fb016ec
0a4e6ee
 
 
 
 
 
fb016ec
 
0a4e6ee
 
 
 
fb016ec
 
0a4e6ee
fb016ec
e9da1a4
0a4e6ee
 
a522955
0a4e6ee
 
 
 
 
 
a522955
0a4e6ee
a522955
0a4e6ee
a522955
0a4e6ee
a522955
0a4e6ee
 
 
a522955
0a4e6ee
 
 
 
 
 
 
 
 
 
a522955
fb016ec
0a4e6ee
a522955
0a4e6ee
 
a522955
0a4e6ee
a522955
0a4e6ee
a522955
 
 
0a4e6ee
 
 
a522955
 
0a4e6ee
a522955
0a4e6ee
a522955
0a4e6ee
a522955
0a4e6ee
 
a522955
 
0a4e6ee
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
language:
- en
- vi
- zh
base_model:
- google/gemma-2-2b-it
pipeline_tag: text-generation
tags:
- vllm
- system-role
- langchain
license: gemma
---

# gemma-2-9b-it-fix-system-role

Modified version of [gemma-2-2b-it](https://huggingface.co/google/gemma-2-9b-it) and update **`chat_template`** for support **`system`** role to handle cases:
- `Conversation roles must alternate user/assistant/user/assistant/...`
- `System role not supported`

## Model Overview
- **Model Architecture:** Gemma 2
  - **Input:** Text
  - **Output:** Text
- **Release Date:** 04/12/2024
- **Version:** 1.0

## Deployment

### Use with vLLM

This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.

With CLI:
```bash
vllm serve --model dangvansam/gemma-2-2b-it-fix-system-role
```
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "dangvansam/gemma-2-2b-it-fix-system-role",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who are you?"}
  ]
}'
```

With Python:
```python
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

model_id = "dangvansam/gemma-2-2b-it-fix-system-role"

sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=256)

tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
  {"role": "system", "content": "You are helpfull assistant."},
  {"role": "user", "content": "Who are you?"}
]

prompts = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

llm = LLM(model=model_id)

outputs = llm.generate(prompts, sampling_params)

generated_text = outputs[0].outputs[0].text
print(generated_text)
```

vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.