Ichsan2895
/

Merak-7B-v4

@@ -10,7 +10,7 @@ pipeline_tag: text-generation
 license: cc-by-nc-sa-4.0
 ---
-# THIS IS 6th PROTOTYPE OF MERAK-7B-v4!
 Merak-7B is the Large Language Model of Indonesian Language
@@ -24,6 +24,120 @@ Big thanks to all my friends and communities that help to build our first model.
 Feel free, to ask me about the model and please share the news on your social media.
 ## CITATION
 ```
 @software{lian2023mistralorca1

 license: cc-by-nc-sa-4.0
 ---
+# HAPPY TO ANNOUNCE THE RELEASE OF MERAK-7B-V4!
 Merak-7B is the Large Language Model of Indonesian Language
 Feel free, to ask me about the model and please share the news on your social media.
+## HOW TO USE
+### Installation
+Please make sure you have installed CUDA driver in your system, Python 3.10 and PyTorch 2. Then install this library in terminal
+```
+pip install protobuf==4.24.4
+pip install bitsandbytes==0.41.1
+pip install transformers==4.34.1
+pip install peft==0.5.0
+pip install accelerate==0.23.0
+pip install einops==0.6.1 scipy sentencepiece datasets
+```
+### Using BitsandBytes and it run with >= 10 GB VRAM GPU
+[![Open in Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Tj15gNIx3KnLarDAJdwpa7qXa5nmfAM-?usp=drive_link)
+```
+import torch
+from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig, LlamaTokenizer
+from peft import PeftModel, PeftConfig
+model_id = "Ichsan2895/Merak-7B-v4"
+config = AutoConfig.from_pretrained(model_id)
+BNB_CONFIG = BitsAndBytesConfig(load_in_4bit=True,
+                                bnb_4bit_compute_dtype=torch.bfloat16,
+                                bnb_4bit_use_double_quant=True,
+                                bnb_4bit_quant_type="nf4",
+    )
+model = AutoModelForCausalLM.from_pretrained(model_id,
+                                             quantization_config=BNB_CONFIG,
+                                             device_map="auto",
+                                             trust_remote_code=True)
+tokenizer = LlamaTokenizer.from_pretrained(model_id)
+def generate_response(question: str) -> str:
+    chat = [
+      {"role": "system", "content": "Anda adalah Merak, sebuah model kecerdasan buatan yang dilatih oleh Muhammad Ichsan. Mohon jawab pertanyaan berikut dengan benar, faktual, dan ramah."},
+      {"role": "user", "content": question},
+    ]
+    prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
+    inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=True)
+    with torch.no_grad():
+        outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"),
+                           attention_mask=inputs.attention_mask,
+                           eos_token_id=tokenizer.eos_token_id,
+                           pad_token_id=tokenizer.eos_token_id,
+                           max_new_tokens=256)
+        response = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
+        assistant_start = f'''{question} \n assistant\n '''
+        response_start = response.find(assistant_start)
+        return response[response_start + len(assistant_start) :].strip()
+prompt = "Siapa penulis naskah proklamasi kemerdekaan Indonesia?"
+print(generate_response(prompt))
+```
+### From my experience, For better answer, please don’t use BitsandBytes 4-bit Quantization, but it using higher VRAM
+[![Open in Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1KVkiaKddrK4focgQJ6ysUA1NypLQPYuF?usp=drive_link)
+```
+import torch
+from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig, LlamaTokenizer
+from peft import PeftModel, PeftConfig
+model_id = "Ichsan2895/Merak-7B-v4"
+config = AutoConfig.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id,
+                                             device_map="auto",
+                                             trust_remote_code=True)
+tokenizer = LlamaTokenizer.from_pretrained(model_id)
+def generate_response(question: str) -> str:
+    chat = [
+      {"role": "system", "content": "Anda adalah Merak, sebuah model kecerdasan buatan yang dilatih oleh Muhammad Ichsan. Mohon jawab pertanyaan berikut dengan benar, faktual, dan ramah."},
+      {"role": "user", "content": question},
+    ]
+    prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
+    inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=True)
+    with torch.no_grad():
+        outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"),
+                           attention_mask=inputs.attention_mask,
+                           eos_token_id=tokenizer.eos_token_id,
+                           pad_token_id=tokenizer.eos_token_id,
+                           max_new_tokens=256)
+        response = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
+        assistant_start = f'''{question} \n assistant\n '''
+        response_start = response.find(assistant_start)
+        return response[response_start + len(assistant_start) :].strip()
+prompt = "Siapa penulis naskah proklamasi kemerdekaan Indonesia?"
+print(generate_response(prompt))
+```
+## CHANGELOG
+**v4** = We use [Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) instead of Llama-2-Chat-HF. We did it throught uncounted trial-and-error. We pick the best one to do this model.
+What we have done so far:
+1st). We fine tuned it with Wikipedia articles that we cleaned it before. It use QLora and speed up by Deepspeed Zero 2 for 1 epoch. Axolotl was used for easier fine tuning configuration.
+2nd). We got extra funds. Thanks all.. We did it again like first step but it was Full Parameter fine tuning (FFT) instead of QLora.
+3rd). We fine tuned it with [Ichsan2895/OASST_Top1_Indonesian](https://huggingface.co/datasets/Ichsan2895/OASST_Top1_Indonesian) & [Ichsan2895/alpaca-gpt4-indonesian](https://huggingface.co/datasets/Ichsan2895/alpaca-gpt4-indonesian) with minor modification, so it was suitable with ChatML format. It was FFT for 4 epochs.
+**v3** = Fine tuned by [Ichsan2895/OASST_Top1_Indonesian](https://huggingface.co/datasets/Ichsan2895/OASST_Top1_Indonesian) & [Ichsan2895/alpaca-gpt4-indonesian](https://huggingface.co/datasets/Ichsan2895/alpaca-gpt4-indonesian)
+**v2** = Finetuned version of first Merak-7B model. We finetuned again with the same ID Wikipedia articles except it changes prompt-style in the questions. It has 600k ID wikipedia articles.
+**v1** = The first Merak-7B model. We selected and cleaned about 200k ID wikipedia articles.
 ## CITATION
 ```
 @software{lian2023mistralorca1