Text Generation
Transformers
PyTorch
Indonesian
English
mistral
conversational
text-generation-inference
Ichsan2895 commited on
Commit
e70dbd7
·
1 Parent(s): 382fe65

Merak-7B-v4-PROTOTYPE6 was chosen for final v4

Browse files
Files changed (1) hide show
  1. README.md +115 -1
README.md CHANGED
@@ -10,7 +10,7 @@ pipeline_tag: text-generation
10
  license: cc-by-nc-sa-4.0
11
  ---
12
 
13
- # THIS IS 6th PROTOTYPE OF MERAK-7B-v4!
14
 
15
  Merak-7B is the Large Language Model of Indonesian Language
16
 
@@ -24,6 +24,120 @@ Big thanks to all my friends and communities that help to build our first model.
24
 
25
  Feel free, to ask me about the model and please share the news on your social media.
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ## CITATION
28
  ```
29
  @software{lian2023mistralorca1
 
10
  license: cc-by-nc-sa-4.0
11
  ---
12
 
13
+ # HAPPY TO ANNOUNCE THE RELEASE OF MERAK-7B-V4!
14
 
15
  Merak-7B is the Large Language Model of Indonesian Language
16
 
 
24
 
25
  Feel free, to ask me about the model and please share the news on your social media.
26
 
27
+ ## HOW TO USE
28
+ ### Installation
29
+ Please make sure you have installed CUDA driver in your system, Python 3.10 and PyTorch 2. Then install this library in terminal
30
+ ```
31
+ pip install protobuf==4.24.4
32
+ pip install bitsandbytes==0.41.1
33
+ pip install transformers==4.34.1
34
+ pip install peft==0.5.0
35
+ pip install accelerate==0.23.0
36
+ pip install einops==0.6.1 scipy sentencepiece datasets
37
+ ```
38
+ ### Using BitsandBytes and it run with >= 10 GB VRAM GPU
39
+ [![Open in Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Tj15gNIx3KnLarDAJdwpa7qXa5nmfAM-?usp=drive_link)
40
+ ```
41
+ import torch
42
+ from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig, LlamaTokenizer
43
+ from peft import PeftModel, PeftConfig
44
+
45
+ model_id = "Ichsan2895/Merak-7B-v4"
46
+ config = AutoConfig.from_pretrained(model_id)
47
+
48
+ BNB_CONFIG = BitsAndBytesConfig(load_in_4bit=True,
49
+ bnb_4bit_compute_dtype=torch.bfloat16,
50
+ bnb_4bit_use_double_quant=True,
51
+ bnb_4bit_quant_type="nf4",
52
+ )
53
+
54
+ model = AutoModelForCausalLM.from_pretrained(model_id,
55
+ quantization_config=BNB_CONFIG,
56
+ device_map="auto",
57
+ trust_remote_code=True)
58
+
59
+ tokenizer = LlamaTokenizer.from_pretrained(model_id)
60
+
61
+ def generate_response(question: str) -> str:
62
+ chat = [
63
+ {"role": "system", "content": "Anda adalah Merak, sebuah model kecerdasan buatan yang dilatih oleh Muhammad Ichsan. Mohon jawab pertanyaan berikut dengan benar, faktual, dan ramah."},
64
+ {"role": "user", "content": question},
65
+ ]
66
+
67
+ prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
68
+
69
+ inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=True)
70
+
71
+ with torch.no_grad():
72
+ outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"),
73
+ attention_mask=inputs.attention_mask,
74
+ eos_token_id=tokenizer.eos_token_id,
75
+ pad_token_id=tokenizer.eos_token_id,
76
+ max_new_tokens=256)
77
+ response = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
78
+
79
+ assistant_start = f'''{question} \n assistant\n '''
80
+ response_start = response.find(assistant_start)
81
+ return response[response_start + len(assistant_start) :].strip()
82
+
83
+ prompt = "Siapa penulis naskah proklamasi kemerdekaan Indonesia?"
84
+ print(generate_response(prompt))
85
+ ```
86
+
87
+
88
+ ### From my experience, For better answer, please don’t use BitsandBytes 4-bit Quantization, but it using higher VRAM
89
+ [![Open in Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1KVkiaKddrK4focgQJ6ysUA1NypLQPYuF?usp=drive_link)
90
+ ```
91
+ import torch
92
+ from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig, LlamaTokenizer
93
+ from peft import PeftModel, PeftConfig
94
+
95
+ model_id = "Ichsan2895/Merak-7B-v4"
96
+ config = AutoConfig.from_pretrained(model_id)
97
+ model = AutoModelForCausalLM.from_pretrained(model_id,
98
+ device_map="auto",
99
+ trust_remote_code=True)
100
+
101
+ tokenizer = LlamaTokenizer.from_pretrained(model_id)
102
+
103
+ def generate_response(question: str) -> str:
104
+ chat = [
105
+ {"role": "system", "content": "Anda adalah Merak, sebuah model kecerdasan buatan yang dilatih oleh Muhammad Ichsan. Mohon jawab pertanyaan berikut dengan benar, faktual, dan ramah."},
106
+ {"role": "user", "content": question},
107
+ ]
108
+
109
+ prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
110
+
111
+ inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=True)
112
+
113
+ with torch.no_grad():
114
+ outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"),
115
+ attention_mask=inputs.attention_mask,
116
+ eos_token_id=tokenizer.eos_token_id,
117
+ pad_token_id=tokenizer.eos_token_id,
118
+ max_new_tokens=256)
119
+ response = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
120
+
121
+ assistant_start = f'''{question} \n assistant\n '''
122
+ response_start = response.find(assistant_start)
123
+ return response[response_start + len(assistant_start) :].strip()
124
+
125
+ prompt = "Siapa penulis naskah proklamasi kemerdekaan Indonesia?"
126
+ print(generate_response(prompt))
127
+ ```
128
+
129
+ ## CHANGELOG
130
+ **v4** = We use [Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) instead of Llama-2-Chat-HF. We did it throught uncounted trial-and-error. We pick the best one to do this model.
131
+
132
+ What we have done so far:
133
+ 1st). We fine tuned it with Wikipedia articles that we cleaned it before. It use QLora and speed up by Deepspeed Zero 2 for 1 epoch. Axolotl was used for easier fine tuning configuration.
134
+ 2nd). We got extra funds. Thanks all.. We did it again like first step but it was Full Parameter fine tuning (FFT) instead of QLora.
135
+ 3rd). We fine tuned it with [Ichsan2895/OASST_Top1_Indonesian](https://huggingface.co/datasets/Ichsan2895/OASST_Top1_Indonesian) & [Ichsan2895/alpaca-gpt4-indonesian](https://huggingface.co/datasets/Ichsan2895/alpaca-gpt4-indonesian) with minor modification, so it was suitable with ChatML format. It was FFT for 4 epochs.
136
+
137
+ **v3** = Fine tuned by [Ichsan2895/OASST_Top1_Indonesian](https://huggingface.co/datasets/Ichsan2895/OASST_Top1_Indonesian) & [Ichsan2895/alpaca-gpt4-indonesian](https://huggingface.co/datasets/Ichsan2895/alpaca-gpt4-indonesian)
138
+ **v2** = Finetuned version of first Merak-7B model. We finetuned again with the same ID Wikipedia articles except it changes prompt-style in the questions. It has 600k ID wikipedia articles.
139
+ **v1** = The first Merak-7B model. We selected and cleaned about 200k ID wikipedia articles.
140
+
141
  ## CITATION
142
  ```
143
  @software{lian2023mistralorca1