bakrianoo commited on
Commit
e30bf68
·
verified ·
1 Parent(s): d4af7f6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +391 -3
README.md CHANGED
@@ -1,3 +1,391 @@
1
- ---
2
- license: gemma
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ extra_gated_button_content: Acknowledge license
6
+ tags:
7
+ - conversational
8
+ ---
9
+
10
+
11
+ # Silma
12
+
13
+ ---
14
+ Thank you for being part of our journey to advance AI for the Arabic-speaking world! 🌟
15
+
16
+ **Authors**: [silma.ai](https://silma.ai)
17
+
18
+ ### Description
19
+
20
+ Silma is a leading Generative AI startup dedicated to empowering Arabic speakers with state-of-the-art AI solutions.
21
+
22
+ ## 🚀 Our Flagship Model: Silma 1.0 🚀
23
+ **Silma 1.0** is the **TOP-RANKED** Arabic LLM with an impressive **9 billion parameter size**, surpassing models that are over seven times larger. 🏆
24
+
25
+ ## 👥 Our Team
26
+ Our team is composed of seasoned **Arabic AI experts** who understand the nuances of the language and cultural considerations, enabling us to build solutions that truly resonate with Arabic users. 🌍✨
27
+
28
+ ### Usage
29
+
30
+ Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:
31
+ ```sh
32
+ pip install -U transformers
33
+ ```
34
+
35
+ Then, copy the snippet from the section that is relevant for your usecase.
36
+
37
+ #### Running with the `pipeline` API
38
+
39
+ ```python
40
+ import torch
41
+ from transformers import pipeline
42
+
43
+ pipe = pipeline(
44
+ "text-generation",
45
+ model="silma-ai/SILMA-9B-Instruct-v0.8",
46
+ model_kwargs={"torch_dtype": torch.bfloat16},
47
+ device="cuda", # replace with "mps" to run on a Mac device
48
+ )
49
+
50
+ messages = [
51
+ {"role": "user", "content": "اكتب رسالة تعتذر فيها لمديري في العمل عن الحضور اليوم لأسباب مرضية."},
52
+ ]
53
+
54
+ outputs = pipe(messages, max_new_tokens=256)
55
+ assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
56
+ print(assistant_response)
57
+ # السلام عليكم ورحمة الله وبركاته، أودّ أن أعتذر عن عدم الحضور إلى العمل اليوم بسبب مرضي. أشكركم على تفهمكم.
58
+ ```
59
+
60
+ #### Running the model on a single / multi GPU
61
+
62
+ ```python
63
+ # pip install accelerate
64
+ from transformers import AutoTokenizer, AutoModelForCausalLM
65
+ import torch
66
+
67
+ model_id = "silma-ai/SILMA-9B-Instruct-v0.8"
68
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
69
+ model = AutoModelForCausalLM.from_pretrained(
70
+ model_id,
71
+ device_map="auto",
72
+ torch_dtype=torch.bfloat16,
73
+ )
74
+
75
+ input_text = "Write me a poem about Machine Learning."
76
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
77
+
78
+ outputs = model.generate(**input_ids, max_new_tokens=32)
79
+ print(tokenizer.decode(outputs[0]))
80
+ ```
81
+
82
+ You can ensure the correct chat template is applied by using `tokenizer.apply_chat_template` as follows:
83
+ ```python
84
+ messages = [
85
+ {"role": "user", "content": "اكتب كود بايثون لتوليد متسلسلة أرقام زوجية."},
86
+ ]
87
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
88
+
89
+ outputs = model.generate(**input_ids, max_new_tokens=256)
90
+ print(tokenizer.decode(outputs[0]))
91
+
92
+ # def generate_even_numbers(n):
93
+ # """
94
+ # This function generates a list of even numbers from 1 to n.
95
+ #
96
+ # Args:
97
+ # n: The upper limit of the range.
98
+ #
99
+ # Returns:
100
+ # A list of even numbers.
101
+ # """
102
+ # return [i for i in range(1, n + 1) if i % 2 == 0]
103
+
104
+ # Example usage
105
+ # n = 10
106
+ # even_numbers = generate_even_numbers(n)
107
+ # print(f"The first {n} even numbers are: {even_numbers}")
108
+
109
+ ```
110
+
111
+ #### Quantized Versions through `bitsandbytes`
112
+
113
+ <details>
114
+ <summary>
115
+ Using 8-bit precision (int8)
116
+ </summary>
117
+
118
+ ```python
119
+ # pip install bitsandbytes accelerate
120
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
121
+
122
+ model_id = "silma-ai/SILMA-9B-Instruct-v0.8"
123
+ quantization_config = BitsAndBytesConfig(load_in_8bit=True)
124
+
125
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
126
+ model = AutoModelForCausalLM.from_pretrained(
127
+ model_id,
128
+ quantization_config=quantization_config,
129
+ )
130
+
131
+ input_text = "اذكر خمس انواع فواكه بها نسب عالية من فيتامين ج."
132
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
133
+
134
+ outputs = model.generate(**input_ids, max_new_tokens=32)
135
+ print(tokenizer.decode(outputs[0]))
136
+
137
+ # الليمون، البرتقال، الموز، الكيوي، الفراولة
138
+
139
+ ```
140
+ </details>
141
+
142
+ <details>
143
+ <summary>
144
+ Using 4-bit precision
145
+ </summary>
146
+
147
+ ```python
148
+ # pip install bitsandbytes accelerate
149
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
150
+
151
+ model_id = "silma-ai/SILMA-9B-Instruct-v0.8"
152
+ quantization_config = BitsAndBytesConfig(load_in_4bit=True)
153
+
154
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
155
+ model = AutoModelForCausalLM.from_pretrained(
156
+ model_id,
157
+ quantization_config=quantization_config,
158
+ )
159
+
160
+ input_text = "في أي عام توفى صلاح الدين الأيوبي؟"
161
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
162
+
163
+ outputs = model.generate(**input_ids, max_new_tokens=32)
164
+ print(tokenizer.decode(outputs[0]))
165
+
166
+ # 1193
167
+ ```
168
+ </details>
169
+
170
+ #### Advanced Usage
171
+
172
+ <details>
173
+ <summary>
174
+ Torch compile
175
+ </summary>
176
+
177
+ [Torch compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) is a method for speeding-up the
178
+ inference of PyTorch modules. The Silma model can be run up to 6x faster by leveraging torch compile.
179
+
180
+ Note that two warm-up steps are required before the full inference speed is realised:
181
+
182
+ ```python
183
+ import os
184
+ os.environ["TOKENIZERS_PARALLELISM"] = "false"
185
+
186
+ from transformers import AutoTokenizer, Gemma2ForCausalLM
187
+ from transformers.cache_utils import HybridCache
188
+ import torch
189
+
190
+ torch.set_float32_matmul_precision("high")
191
+
192
+ # load the model + tokenizer
193
+ model_id = "silma-ai/SILMA-9B-Instruct-v0.8"
194
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
195
+ model = Gemma2ForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
196
+ model.to("cuda")
197
+
198
+ # apply the torch compile transformation
199
+ model.forward = torch.compile(model.forward, mode="reduce-overhead", fullgraph=True)
200
+
201
+ # pre-process inputs
202
+ input_text = "من الرئيس الذي تولى المنصب في أمريكا بعد دونالد ترامب؟"
203
+ model_inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
204
+ prompt_length = model_inputs.input_ids.shape[1]
205
+
206
+ # set-up k/v cache
207
+ past_key_values = HybridCache(
208
+ config=model.config,
209
+ max_batch_size=1,
210
+ max_cache_len=model.config.max_position_embeddings,
211
+ device=model.device,
212
+ dtype=model.dtype
213
+ )
214
+
215
+ # enable passing kv cache to generate
216
+ model._supports_cache_class = True
217
+ model.generation_config.cache_implementation = None
218
+
219
+ # two warm-up steps
220
+ for idx in range(2):
221
+ outputs = model.generate(**model_inputs, past_key_values=past_key_values, do_sample=True, temperature=1.0, max_new_tokens=128)
222
+ past_key_values.reset()
223
+
224
+ # fast run
225
+ outputs = model.generate(**model_inputs, past_key_values=past_key_values, do_sample=True, temperature=1.0, max_new_tokens=128)
226
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
227
+
228
+ # جو بايدن
229
+
230
+ ```
231
+
232
+ For more details, refer to the [Transformers documentation](https://huggingface.co/docs/transformers/main/en/llm_optims?static-kv=basic+usage%3A+generation_config).
233
+
234
+ </details>
235
+
236
+ ### Chat Template
237
+
238
+ The instruction-tuned models use a chat template that must be adhered to for conversational use.
239
+ The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.
240
+
241
+ Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:
242
+
243
+ ```python
244
+ from transformers import AutoTokenizer, AutoModelForCausalLM
245
+ import transformers
246
+ import torch
247
+
248
+ model_id = "silma-ai/SILMA-9B-Instruct-v0.8"
249
+ dtype = torch.bfloat16
250
+
251
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
252
+ model = AutoModelForCausalLM.from_pretrained(
253
+ model_id,
254
+ device_map="cuda",
255
+ torch_dtype=dtype,)
256
+
257
+ chat = [
258
+ { "role": "user", "content": "ما اشهر اطارات العمل في البايثون لبناء نماذج الذكاء الاصطناعي؟" },
259
+ ]
260
+ prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
261
+ ```
262
+
263
+ At this point, the prompt contains the following text:
264
+
265
+ ```
266
+ <bos><start_of_turn>user
267
+ ما اشهر اطارات العمل في البايثون لبناء نماذج الذكاء الاصطناعي؟<end_of_turn>
268
+ <start_of_turn>model
269
+ ```
270
+
271
+ As you can see, each turn is preceded by a `<start_of_turn>` delimiter and then the role of the entity
272
+ (either `user`, for content supplied by the user, or `model` for LLM responses). Turns finish with
273
+ the `<end_of_turn>` token.
274
+
275
+ You can follow this format to build the prompt manually, if you need to do it without the tokenizer's
276
+ chat template.
277
+
278
+ After the prompt is ready, generation can be performed like this:
279
+
280
+ ```python
281
+ inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
282
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
283
+ print(tokenizer.decode(outputs[0]))
284
+ ```
285
+
286
+ ### Inputs and outputs
287
+
288
+ * **Input:** Text string, such as a question, a prompt, or a document to be
289
+ summarized.
290
+ * **Output:** Generated English-language text in response to the input, such
291
+ as an answer to a question, or a summary of a document.
292
+
293
+ ### Citation
294
+
295
+ ```none
296
+ @article{silma_01_2024,
297
+ title={Silma},
298
+ url={https://www.silma.ai},
299
+ publisher={Silma},
300
+ author={Silma Team},
301
+ year={2024}
302
+ }
303
+ ```
304
+
305
+ ## Usage and Limitations
306
+
307
+ These models have certain limitations that users should be aware of.
308
+
309
+ ### Intended Usage
310
+
311
+ Open Large Language Models (LLMs) have a wide range of applications across
312
+ various industries and domains. The following list of potential uses is not
313
+ comprehensive. The purpose of this list is to provide contextual information
314
+ about the possible use-cases that the model creators considered as part of model
315
+ training and development.
316
+
317
+ * Content Creation and Communication
318
+ * Text Generation: These models can be used to generate creative text formats
319
+ such as poems, scripts, code, marketing copy, and email drafts.
320
+ * Chatbots and Conversational AI: Power conversational interfaces for customer
321
+ service, virtual assistants, or interactive applications.
322
+ * Text Summarization: Generate concise summaries of a text corpus, research
323
+ papers, or reports.
324
+ * Research and Education
325
+ * Natural Language Processing (NLP) Research: These models can serve as a
326
+ foundation for researchers to experiment with NLP techniques, develop
327
+ algorithms, and contribute to the advancement of the field.
328
+ * Language Learning Tools: Support interactive language learning experiences,
329
+ aiding in grammar correction or providing writing practice.
330
+ * Knowledge Exploration: Assist researchers in exploring large bodies of text
331
+ by generating summaries or answering questions about specific topics.
332
+
333
+ ### Limitations
334
+
335
+ * Training Data
336
+ * The quality and diversity of the training data significantly influence the
337
+ model's capabilities. Biases or gaps in the training data can lead to
338
+ limitations in the model's responses.
339
+ * The scope of the training dataset determines the subject areas the model can
340
+ handle effectively.
341
+ * Context and Task Complexity
342
+ * LLMs are better at tasks that can be framed with clear prompts and
343
+ instructions. Open-ended or highly complex tasks might be challenging.
344
+ * A model's performance can be influenced by the amount of context provided
345
+ (longer context generally leads to better outputs, up to a certain point).
346
+ * Language Ambiguity and Nuance
347
+ * Natural language is inherently complex. LLMs might struggle to grasp subtle
348
+ nuances, sarcasm, or figurative language.
349
+ * Factual Accuracy
350
+ * LLMs generate responses based on information they learned from their
351
+ training datasets, but they are not knowledge bases. They may generate
352
+ incorrect or outdated factual statements.
353
+ * Common Sense
354
+ * LLMs rely on statistical patterns in language. They might lack the ability
355
+ to apply common sense reasoning in certain situations.
356
+
357
+ ### Ethical Considerations and Risks
358
+
359
+ The development of large language models (LLMs) raises several ethical concerns.
360
+ In creating an open model, we have carefully considered the following:
361
+
362
+ * Bias and Fairness
363
+ * LLMs trained on large-scale, real-world text data can reflect socio-cultural
364
+ biases embedded in the training material. These models underwent careful
365
+ scrutiny, input data pre-processing described and posterior evaluations
366
+ reported in this card.
367
+ * Misinformation and Misuse
368
+ * LLMs can be misused to generate text that is false, misleading, or harmful.
369
+ * Guidelines are provided for responsible use with the model, see the
370
+ [Responsible Generative AI Toolkit][rai-toolkit].
371
+ * Transparency and Accountability:
372
+ * This model card summarizes details on the models' architecture,
373
+ capabilities, limitations, and evaluation processes.
374
+ * A responsibly developed open model offers the opportunity to share
375
+ innovation by making LLM technology accessible to developers and researchers
376
+ across the AI ecosystem.
377
+
378
+ Risks identified and mitigations:
379
+
380
+ * Perpetuation of biases: It's encouraged to perform continuous monitoring
381
+ (using evaluation metrics, human review) and the exploration of de-biasing
382
+ techniques during model training, fine-tuning, and other use cases.
383
+ * Generation of harmful content: Mechanisms and guidelines for content safety
384
+ are essential. Developers are encouraged to exercise caution and implement
385
+ appropriate content safety safeguards based on their specific product policies
386
+ and application use cases.
387
+ * Privacy violations: Models were trained on data filtered for removal of PII
388
+ (Personally Identifiable Information). Developers are encouraged to adhere to
389
+ privacy regulations with privacy-preserving techniques.
390
+
391
+