Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,117 @@
|
|
1 |
-
|
|
|
|
|
2 |
license: llama2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
inference: false
|
2 |
+
language:
|
3 |
+
- en
|
4 |
license: llama2
|
5 |
+
model_type: llama
|
6 |
+
pipeline_tag: text-generation
|
7 |
+
tags:
|
8 |
+
- facebook
|
9 |
+
- meta
|
10 |
+
- pytorch
|
11 |
+
- llama
|
12 |
+
- llama-2
|
13 |
---
|
14 |
+
|
15 |
+
# 🦚Merak-7B-v3-Mini-Orca GPTQ🐳
|
16 |
+
<p align="center">
|
17 |
+
<img src="https://i.imgur.com/39sQd3h.png" alt="Merak Orca" width="300" height="300"/>
|
18 |
+
</p>
|
19 |
+
|
20 |
+
These files are GPTQ model files for [**Merak-7B-v3-Mini-Orca**](https://huggingface.co/asyafiqe/Merak-7B-v3-Mini-Orca-Indo)
|
21 |
+
|
22 |
+
[**Merak-7B-v3-Mini-Orca**](https://huggingface.co/asyafiqe/Merak-7B-v3-Mini-Orca-Indo) is Ichsan2895's [Merak-7B-v3](https://huggingface.co/Ichsan2895/Merak-7B-v3) fine-tuned on Bahasa Indonesia translated psmathur's [orca_mini_v1_dataset](https://huggingface.co/datasets/psmathur/orca_mini_v1_dataset).
|
23 |
+
|
24 |
+
|
25 |
+
### Prompt format
|
26 |
+
You can use [Vicuna 1.1](https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates/Vicuna-v1.1.yaml) format for Ooobabooga's text generation webui.
|
27 |
+
```
|
28 |
+
SYSTEM: Anda adalah asisten AI. Anda akan diberi tugas. Anda harus menghasilkan jawaban yang rinci dan panjang.
|
29 |
+
USER: <prompt> (without the <>)
|
30 |
+
ASSISTANT:
|
31 |
+
```
|
32 |
+
|
33 |
+
## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
34 |
+
|
35 |
+
Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
36 |
+
|
37 |
+
It is strongly recommended to use the text-generation-webui one-click-installers unless you know how to make a manual install.
|
38 |
+
|
39 |
+
1. Click the **Model tab**.
|
40 |
+
2. Under **Download custom model or LoRA**, enter `asyafiqe/Merak-7B-v3-Mini-Orca-Indo-GPTQ`.
|
41 |
+
- To download from a specific branch, enter for example `asyafiqe/Merak-7B-v3-Mini-Orca-Indo-GPTQ`
|
42 |
+
3. Click **Download**.
|
43 |
+
4. The model will start downloading. Once it's finished it will say "Done"
|
44 |
+
5. In the top left, click the refresh icon next to **Model**.
|
45 |
+
6. In the **Model** dropdown, choose the model you just downloaded: `Merak-7B-v3-Mini-Orca-Indo-GPTQ`
|
46 |
+
7. The model will automatically load, and is now ready for use!
|
47 |
+
8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
|
48 |
+
* Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
|
49 |
+
9. Once you're ready, click the **Text Generation tab** and enter a prompt to get started!
|
50 |
+
|
51 |
+
## How to use this GPTQ model from Python code
|
52 |
+
|
53 |
+
First make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:
|
54 |
+
|
55 |
+
`GITHUB_ACTIONS=true pip install auto-gptq`
|
56 |
+
|
57 |
+
Then try the following example code:
|
58 |
+
|
59 |
+
```python
|
60 |
+
from transformers import AutoTokenizer, pipeline, logging
|
61 |
+
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
|
62 |
+
|
63 |
+
model_name_or_path = "asyafiqe/Merak-7B-v3-Mini-Orca-Indo-GPTQ"
|
64 |
+
model_basename = "model"
|
65 |
+
|
66 |
+
use_triton = False
|
67 |
+
|
68 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
|
69 |
+
|
70 |
+
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
71 |
+
model_basename=model_basename,
|
72 |
+
use_safetensors=True,
|
73 |
+
trust_remote_code=True,
|
74 |
+
device="cuda:0",
|
75 |
+
use_triton=use_triton,
|
76 |
+
quantize_config=None)
|
77 |
+
|
78 |
+
|
79 |
+
prompt = "Tell me about AI"
|
80 |
+
system_message = "Anda adalah asisten AI. Anda akan diberi tugas. Anda harus menghasilkan jawaban yang rinci dan panjang.\n"
|
81 |
+
prompt_template=f'''SYSTEM: {system_message}
|
82 |
+
USER: {prompt}
|
83 |
+
ASSISTANT: '''
|
84 |
+
|
85 |
+
print("\n\n*** Generate:")
|
86 |
+
|
87 |
+
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
|
88 |
+
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
|
89 |
+
print(tokenizer.decode(output[0]))
|
90 |
+
|
91 |
+
# Inference can also be done using transformers' pipeline
|
92 |
+
|
93 |
+
# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
|
94 |
+
logging.set_verbosity(logging.CRITICAL)
|
95 |
+
|
96 |
+
print("*** Pipeline:")
|
97 |
+
pipe = pipeline(
|
98 |
+
"text-generation",
|
99 |
+
model=model,
|
100 |
+
tokenizer=tokenizer,
|
101 |
+
max_new_tokens=512,
|
102 |
+
temperature=0.7,
|
103 |
+
top_p=0.95,
|
104 |
+
repetition_penalty=1.15
|
105 |
+
)
|
106 |
+
|
107 |
+
print(pipe(prompt_template)[0]['generated_text'])
|
108 |
+
```
|
109 |
+
|
110 |
+
## Compatibility
|
111 |
+
|
112 |
+
The files provided will work with AutoGPTQ (CUDA and Triton modes), GPTQ-for-LLaMa (only CUDA has been tested), and Occ4m's GPTQ-for-LLaMa fork.
|
113 |
+
|
114 |
+
ExLlama works with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility.
|
115 |
+
|
116 |
+
## Credits
|
117 |
+
[TheBloke](https://huggingface.co/TheBloke/) for the Readme template.
|