lordjimen commited on
Commit
83c1a07
·
verified ·
1 Parent(s): 97f0b38

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -33
README.md CHANGED
@@ -64,15 +64,52 @@ In addition to benchmark evaluation, we evaluated the BgGPT 27B model in terms o
64
  The results show that our model **significantly surpasses** the performance of the smaller variants of commercial models, such as Anthropic’s Claude Haiku and OpenAI’s GPT-4o-mini in Bulgarian chat performance,
65
  and is **on par** with the best commercial models, such as Anthropic’s Claude Sonnet and OpenAI’s GPT-4o **according to GPT-4o itself**.
66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  # Instruction format
68
 
69
  In order to leverage instruction fine-tuning, your prompt should begin with a beginning-of-sequence token `<bos>` and be formatted in the Gemma 2 chat template. `<bos>` should only be the first token in a chat sequence.
70
 
71
  E.g.
72
  ```
73
- <bos><start_of_turn>user\n
74
- Кога е основан Софийският университет?<end_of_turn>\n
75
- <start_of_turn>model\n
 
76
  ```
77
 
78
  This format is also available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method:
@@ -88,39 +125,13 @@ messages = [
88
  ]
89
  input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
90
 
91
- outputs = model.generate(**input_ids, max_new_tokens=256)
 
 
 
92
  print(tokenizer.decode(outputs[0]))
93
-
94
  ```
95
 
96
- # Recommended Parameters
97
-
98
- For optimal performance, we recommend the following parameters for text generation, as we have extensively tested our model with them:
99
-
100
- ```python
101
- generation_params = {
102
- "temperature": 0.1
103
- "top_k": 20,
104
- "repetition_penalty": 1.1
105
- }
106
- ```
107
-
108
- In principle, increasing temperature should work adequately as well.
109
-
110
- # Use in 🤗 Transformers
111
- First install the latest version of the transformers library:
112
- ```
113
- pip install -U 'transformers[torch]'
114
- ```
115
- Then load the model in transformers:
116
- ```python
117
- model = AutoModelForCausalLM.from_pretrained(
118
- "INSAIT-Institute/BgGPT-Gemma-2-27B-IT-v1.0",
119
- torch_dtype=torch.bfloat16,
120
- attn_implementation="eager",
121
- device_map="auto",
122
- )
123
- ```
124
  **Important Note:** Models based on Gemma 2 such as BgGPT-Gemma-2-27B-IT-v1.0 do not support flash attention. Using it results in degraded performance.
125
 
126
  # Use with GGML / llama.cpp
 
64
  The results show that our model **significantly surpasses** the performance of the smaller variants of commercial models, such as Anthropic’s Claude Haiku and OpenAI’s GPT-4o-mini in Bulgarian chat performance,
65
  and is **on par** with the best commercial models, such as Anthropic’s Claude Sonnet and OpenAI’s GPT-4o **according to GPT-4o itself**.
66
 
67
+ # Use in 🤗 Transformers
68
+ First install the latest version of the transformers library:
69
+ ```
70
+ pip install -U 'transformers[torch]'
71
+ ```
72
+ Then load the model in transformers:
73
+ ```python
74
+ from transformers import AutoModelForCausalLM
75
+
76
+ model = AutoModelForCausalLM.from_pretrained(
77
+ "INSAIT-Institute/BgGPT-Gemma-2-27B-IT-v1.0",
78
+ torch_dtype=torch.bfloat16,
79
+ attn_implementation="eager",
80
+ device_map="auto",
81
+ )
82
+ ```
83
+
84
+ # Recommended Parameters
85
+
86
+ For optimal performance, we recommend the following parameters for text generation, as we have extensively tested our model with them:
87
+
88
+ ```python
89
+ from transformers import GenerationConfig
90
+
91
+ generation_params = GenerationConfig(
92
+ max_new_tokens=2048, # Choose maximum generation tokens
93
+ temperature=0.1,
94
+ top_k=25,
95
+ top_p=1
96
+ repetition_penalty=1.1
97
+ eos_token_id=[1,107]
98
+ )
99
+ ```
100
+
101
+ In principle, increasing temperature should work adequately as well.
102
+
103
  # Instruction format
104
 
105
  In order to leverage instruction fine-tuning, your prompt should begin with a beginning-of-sequence token `<bos>` and be formatted in the Gemma 2 chat template. `<bos>` should only be the first token in a chat sequence.
106
 
107
  E.g.
108
  ```
109
+ <bos><start_of_turn>user
110
+ Кога е основан Софийският университет?<end_of_turn>
111
+ <start_of_turn>model
112
+
113
  ```
114
 
115
  This format is also available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method:
 
125
  ]
126
  input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
127
 
128
+ outputs = model.generate(
129
+ **input_ids,
130
+ generation_config=generation_params
131
+ )
132
  print(tokenizer.decode(outputs[0]))
 
133
  ```
134
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
  **Important Note:** Models based on Gemma 2 such as BgGPT-Gemma-2-27B-IT-v1.0 do not support flash attention. Using it results in degraded performance.
136
 
137
  # Use with GGML / llama.cpp