asyafiqe
/

Merak-7B-v3-Mini-Orca-Indo-GPTQ

+inference: false
+language:
+- en
 license: llama2
+model_type: llama
+pipeline_tag: text-generation
+tags:
+- facebook
+- meta
+- pytorch
+- llama
+- llama-2
 ---
+# 🦚Merak-7B-v3-Mini-Orca GPTQ🐳
+<p align="center">
+<img src="https://i.imgur.com/39sQd3h.png" alt="Merak Orca" width="300" height="300"/>
+</p>
+These files are GPTQ model files for [**Merak-7B-v3-Mini-Orca**](https://huggingface.co/asyafiqe/Merak-7B-v3-Mini-Orca-Indo)
+[**Merak-7B-v3-Mini-Orca**](https://huggingface.co/asyafiqe/Merak-7B-v3-Mini-Orca-Indo) is Ichsan2895's [Merak-7B-v3](https://huggingface.co/Ichsan2895/Merak-7B-v3) fine-tuned on Bahasa Indonesia translated psmathur's [orca_mini_v1_dataset](https://huggingface.co/datasets/psmathur/orca_mini_v1_dataset).
+### Prompt format
+You can use [Vicuna 1.1](https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates/Vicuna-v1.1.yaml) format for Ooobabooga's text generation webui.
+```
+SYSTEM: Anda adalah asisten AI. Anda akan diberi tugas. Anda harus menghasilkan jawaban yang rinci dan panjang.
+USER: <prompt> (without the <>)
+ASSISTANT:
+```
+## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
+Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
+It is strongly recommended to use the text-generation-webui one-click-installers unless you know how to make a manual install.
+1. Click the **Model tab**.
+2. Under **Download custom model or LoRA**, enter `asyafiqe/Merak-7B-v3-Mini-Orca-Indo-GPTQ`.
+  - To download from a specific branch, enter for example `asyafiqe/Merak-7B-v3-Mini-Orca-Indo-GPTQ`
+3. Click **Download**.
+4. The model will start downloading. Once it's finished it will say "Done"
+5. In the top left, click the refresh icon next to **Model**.
+6. In the **Model** dropdown, choose the model you just downloaded: `Merak-7B-v3-Mini-Orca-Indo-GPTQ`
+7. The model will automatically load, and is now ready for use!
+8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
+  * Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
+9. Once you're ready, click the **Text Generation tab** and enter a prompt to get started!
+## How to use this GPTQ model from Python code
+First make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:
+`GITHUB_ACTIONS=true pip install auto-gptq`
+Then try the following example code:
+```python
+from transformers import AutoTokenizer, pipeline, logging
+from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
+model_name_or_path = "asyafiqe/Merak-7B-v3-Mini-Orca-Indo-GPTQ"
+model_basename = "model"
+use_triton = False
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
+model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
+        model_basename=model_basename,
+        use_safetensors=True,
+        trust_remote_code=True,
+        device="cuda:0",
+        use_triton=use_triton,
+        quantize_config=None)
+prompt = "Tell me about AI"
+system_message = "Anda adalah asisten AI. Anda akan diberi tugas. Anda harus menghasilkan jawaban yang rinci dan panjang.\n"
+prompt_template=f'''SYSTEM: {system_message}
+USER: {prompt}
+ASSISTANT: '''
+print("\n\n*** Generate:")
+input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
+output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
+print(tokenizer.decode(output[0]))
+# Inference can also be done using transformers' pipeline
+# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
+logging.set_verbosity(logging.CRITICAL)
+print("*** Pipeline:")
+pipe = pipeline(
+    "text-generation",
+    model=model,
+    tokenizer=tokenizer,
+    max_new_tokens=512,
+    temperature=0.7,
+    top_p=0.95,
+    repetition_penalty=1.15
+)
+print(pipe(prompt_template)[0]['generated_text'])
+```
+## Compatibility
+The files provided will work with AutoGPTQ (CUDA and Triton modes), GPTQ-for-LLaMa (only CUDA has been tested), and Occ4m's GPTQ-for-LLaMa fork.
+ExLlama works with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility.
+## Credits
+[TheBloke](https://huggingface.co/TheBloke/) for the Readme template.