Commit
·
e70dbd7
1
Parent(s):
382fe65
Merak-7B-v4-PROTOTYPE6 was chosen for final v4
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ pipeline_tag: text-generation
|
|
10 |
license: cc-by-nc-sa-4.0
|
11 |
---
|
12 |
|
13 |
-
#
|
14 |
|
15 |
Merak-7B is the Large Language Model of Indonesian Language
|
16 |
|
@@ -24,6 +24,120 @@ Big thanks to all my friends and communities that help to build our first model.
|
|
24 |
|
25 |
Feel free, to ask me about the model and please share the news on your social media.
|
26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
## CITATION
|
28 |
```
|
29 |
@software{lian2023mistralorca1
|
|
|
10 |
license: cc-by-nc-sa-4.0
|
11 |
---
|
12 |
|
13 |
+
# HAPPY TO ANNOUNCE THE RELEASE OF MERAK-7B-V4!
|
14 |
|
15 |
Merak-7B is the Large Language Model of Indonesian Language
|
16 |
|
|
|
24 |
|
25 |
Feel free, to ask me about the model and please share the news on your social media.
|
26 |
|
27 |
+
## HOW TO USE
|
28 |
+
### Installation
|
29 |
+
Please make sure you have installed CUDA driver in your system, Python 3.10 and PyTorch 2. Then install this library in terminal
|
30 |
+
```
|
31 |
+
pip install protobuf==4.24.4
|
32 |
+
pip install bitsandbytes==0.41.1
|
33 |
+
pip install transformers==4.34.1
|
34 |
+
pip install peft==0.5.0
|
35 |
+
pip install accelerate==0.23.0
|
36 |
+
pip install einops==0.6.1 scipy sentencepiece datasets
|
37 |
+
```
|
38 |
+
### Using BitsandBytes and it run with >= 10 GB VRAM GPU
|
39 |
+
[](https://colab.research.google.com/drive/1Tj15gNIx3KnLarDAJdwpa7qXa5nmfAM-?usp=drive_link)
|
40 |
+
```
|
41 |
+
import torch
|
42 |
+
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig, LlamaTokenizer
|
43 |
+
from peft import PeftModel, PeftConfig
|
44 |
+
|
45 |
+
model_id = "Ichsan2895/Merak-7B-v4"
|
46 |
+
config = AutoConfig.from_pretrained(model_id)
|
47 |
+
|
48 |
+
BNB_CONFIG = BitsAndBytesConfig(load_in_4bit=True,
|
49 |
+
bnb_4bit_compute_dtype=torch.bfloat16,
|
50 |
+
bnb_4bit_use_double_quant=True,
|
51 |
+
bnb_4bit_quant_type="nf4",
|
52 |
+
)
|
53 |
+
|
54 |
+
model = AutoModelForCausalLM.from_pretrained(model_id,
|
55 |
+
quantization_config=BNB_CONFIG,
|
56 |
+
device_map="auto",
|
57 |
+
trust_remote_code=True)
|
58 |
+
|
59 |
+
tokenizer = LlamaTokenizer.from_pretrained(model_id)
|
60 |
+
|
61 |
+
def generate_response(question: str) -> str:
|
62 |
+
chat = [
|
63 |
+
{"role": "system", "content": "Anda adalah Merak, sebuah model kecerdasan buatan yang dilatih oleh Muhammad Ichsan. Mohon jawab pertanyaan berikut dengan benar, faktual, dan ramah."},
|
64 |
+
{"role": "user", "content": question},
|
65 |
+
]
|
66 |
+
|
67 |
+
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
|
68 |
+
|
69 |
+
inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=True)
|
70 |
+
|
71 |
+
with torch.no_grad():
|
72 |
+
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"),
|
73 |
+
attention_mask=inputs.attention_mask,
|
74 |
+
eos_token_id=tokenizer.eos_token_id,
|
75 |
+
pad_token_id=tokenizer.eos_token_id,
|
76 |
+
max_new_tokens=256)
|
77 |
+
response = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
|
78 |
+
|
79 |
+
assistant_start = f'''{question} \n assistant\n '''
|
80 |
+
response_start = response.find(assistant_start)
|
81 |
+
return response[response_start + len(assistant_start) :].strip()
|
82 |
+
|
83 |
+
prompt = "Siapa penulis naskah proklamasi kemerdekaan Indonesia?"
|
84 |
+
print(generate_response(prompt))
|
85 |
+
```
|
86 |
+
|
87 |
+
|
88 |
+
### From my experience, For better answer, please don’t use BitsandBytes 4-bit Quantization, but it using higher VRAM
|
89 |
+
[](https://colab.research.google.com/drive/1KVkiaKddrK4focgQJ6ysUA1NypLQPYuF?usp=drive_link)
|
90 |
+
```
|
91 |
+
import torch
|
92 |
+
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig, LlamaTokenizer
|
93 |
+
from peft import PeftModel, PeftConfig
|
94 |
+
|
95 |
+
model_id = "Ichsan2895/Merak-7B-v4"
|
96 |
+
config = AutoConfig.from_pretrained(model_id)
|
97 |
+
model = AutoModelForCausalLM.from_pretrained(model_id,
|
98 |
+
device_map="auto",
|
99 |
+
trust_remote_code=True)
|
100 |
+
|
101 |
+
tokenizer = LlamaTokenizer.from_pretrained(model_id)
|
102 |
+
|
103 |
+
def generate_response(question: str) -> str:
|
104 |
+
chat = [
|
105 |
+
{"role": "system", "content": "Anda adalah Merak, sebuah model kecerdasan buatan yang dilatih oleh Muhammad Ichsan. Mohon jawab pertanyaan berikut dengan benar, faktual, dan ramah."},
|
106 |
+
{"role": "user", "content": question},
|
107 |
+
]
|
108 |
+
|
109 |
+
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
|
110 |
+
|
111 |
+
inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=True)
|
112 |
+
|
113 |
+
with torch.no_grad():
|
114 |
+
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"),
|
115 |
+
attention_mask=inputs.attention_mask,
|
116 |
+
eos_token_id=tokenizer.eos_token_id,
|
117 |
+
pad_token_id=tokenizer.eos_token_id,
|
118 |
+
max_new_tokens=256)
|
119 |
+
response = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
|
120 |
+
|
121 |
+
assistant_start = f'''{question} \n assistant\n '''
|
122 |
+
response_start = response.find(assistant_start)
|
123 |
+
return response[response_start + len(assistant_start) :].strip()
|
124 |
+
|
125 |
+
prompt = "Siapa penulis naskah proklamasi kemerdekaan Indonesia?"
|
126 |
+
print(generate_response(prompt))
|
127 |
+
```
|
128 |
+
|
129 |
+
## CHANGELOG
|
130 |
+
**v4** = We use [Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) instead of Llama-2-Chat-HF. We did it throught uncounted trial-and-error. We pick the best one to do this model.
|
131 |
+
|
132 |
+
What we have done so far:
|
133 |
+
1st). We fine tuned it with Wikipedia articles that we cleaned it before. It use QLora and speed up by Deepspeed Zero 2 for 1 epoch. Axolotl was used for easier fine tuning configuration.
|
134 |
+
2nd). We got extra funds. Thanks all.. We did it again like first step but it was Full Parameter fine tuning (FFT) instead of QLora.
|
135 |
+
3rd). We fine tuned it with [Ichsan2895/OASST_Top1_Indonesian](https://huggingface.co/datasets/Ichsan2895/OASST_Top1_Indonesian) & [Ichsan2895/alpaca-gpt4-indonesian](https://huggingface.co/datasets/Ichsan2895/alpaca-gpt4-indonesian) with minor modification, so it was suitable with ChatML format. It was FFT for 4 epochs.
|
136 |
+
|
137 |
+
**v3** = Fine tuned by [Ichsan2895/OASST_Top1_Indonesian](https://huggingface.co/datasets/Ichsan2895/OASST_Top1_Indonesian) & [Ichsan2895/alpaca-gpt4-indonesian](https://huggingface.co/datasets/Ichsan2895/alpaca-gpt4-indonesian)
|
138 |
+
**v2** = Finetuned version of first Merak-7B model. We finetuned again with the same ID Wikipedia articles except it changes prompt-style in the questions. It has 600k ID wikipedia articles.
|
139 |
+
**v1** = The first Merak-7B model. We selected and cleaned about 200k ID wikipedia articles.
|
140 |
+
|
141 |
## CITATION
|
142 |
```
|
143 |
@software{lian2023mistralorca1
|