oguzhandoganoglu commited on
Commit
b075f1f
1 Parent(s): 443524a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -4,17 +4,17 @@ language:
4
  - tr
5
 
6
  ---
7
- <img src="https://cdn-uploads.huggingface.co/production/uploads/6639e48c27ef2d37a71eb4aa/Ds_KOVYwhRQ1FQY8S4WqO.png"
8
  alt="CEREBRUM LLM" width="420"/>
9
 
10
 
11
- # CERE V2 -LLMA-3.1-8b-TR
12
 
13
- This model is an fine-tuned version of a Llama3.1 8b Large Language Model (LLM) for Turkish. It was trained on a high quality Turkish instruction sets created from various open-source and internal resources. Turkish Instruction dataset carefully annotated to carry out Turkish instructions in an accurate and organized manner.
14
 
15
  ## Model Details
16
 
17
- - **Base Model**: LLMA 3.1 8B based LLM
18
  - **Tokenizer Extension**: Specifically extended for Turkish
19
  - **Training Dataset**: Cleaned Turkish raw data with 5 billion tokens, custom Turkish instruction sets
20
  - **Training Method**: Initially with DORA, followed by fine-tuning with LORA
@@ -37,11 +37,11 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
37
  device = "cuda" # the device to load the model onto
38
 
39
  model = AutoModelForCausalLM.from_pretrained(
40
- "Cerebrum/cere-llama-3.1-8B-tr",
41
  torch_dtype="auto",
42
  device_map="auto"
43
  )
44
- tokenizer = AutoTokenizer.from_pretrained("Cerebrum/cere-llama-3.1-8B-tr")
45
 
46
  prompt = "Python'da ekrana 'Merhaba Dünya' nasıl yazılır?"
47
  messages = [
@@ -68,4 +68,4 @@ generated_ids = [
68
  ]
69
 
70
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
71
- ```
 
4
  - tr
5
 
6
  ---
7
+ <img src="https://huggingface.co/CerebrumTech/cere-llama-3-8b-tr/resolve/main/cere2.png"
8
  alt="CEREBRUM LLM" width="420"/>
9
 
10
 
11
+ # CERE-LLMA-3-8b-TR
12
 
13
+ This model is an fine-tuned version of a Llama3 8b Large Language Model (LLM) for Turkish. It was trained on a high quality Turkish instruction sets created from various open-source and internal resources. Turkish Instruction dataset carefully annotated to carry out Turkish instructions in an accurate and organized manner.
14
 
15
  ## Model Details
16
 
17
+ - **Base Model**: LLMA 3 7B based LLM
18
  - **Tokenizer Extension**: Specifically extended for Turkish
19
  - **Training Dataset**: Cleaned Turkish raw data with 5 billion tokens, custom Turkish instruction sets
20
  - **Training Method**: Initially with DORA, followed by fine-tuning with LORA
 
37
  device = "cuda" # the device to load the model onto
38
 
39
  model = AutoModelForCausalLM.from_pretrained(
40
+ "Cerebrum/cere-llama-3-8b-tr",
41
  torch_dtype="auto",
42
  device_map="auto"
43
  )
44
+ tokenizer = AutoTokenizer.from_pretrained("Cerebrum/cere-llama-3-8b-tr")
45
 
46
  prompt = "Python'da ekrana 'Merhaba Dünya' nasıl yazılır?"
47
  messages = [
 
68
  ]
69
 
70
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
71
+ ```