asyafiqe commited on
Commit
cf65dca
1 Parent(s): 4cd234b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -1
README.md CHANGED
@@ -1,3 +1,117 @@
1
- ---
 
 
2
  license: llama2
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ inference: false
2
+ language:
3
+ - en
4
  license: llama2
5
+ model_type: llama
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - facebook
9
+ - meta
10
+ - pytorch
11
+ - llama
12
+ - llama-2
13
  ---
14
+
15
+ # 🦚Merak-7B-v3-Mini-Orca GPTQ🐳
16
+ <p align="center">
17
+ <img src="https://i.imgur.com/39sQd3h.png" alt="Merak Orca" width="300" height="300"/>
18
+ </p>
19
+
20
+ These files are GPTQ model files for [**Merak-7B-v3-Mini-Orca**](https://huggingface.co/asyafiqe/Merak-7B-v3-Mini-Orca-Indo)
21
+
22
+ [**Merak-7B-v3-Mini-Orca**](https://huggingface.co/asyafiqe/Merak-7B-v3-Mini-Orca-Indo) is Ichsan2895's [Merak-7B-v3](https://huggingface.co/Ichsan2895/Merak-7B-v3) fine-tuned on Bahasa Indonesia translated psmathur's [orca_mini_v1_dataset](https://huggingface.co/datasets/psmathur/orca_mini_v1_dataset).
23
+
24
+
25
+ ### Prompt format
26
+ You can use [Vicuna 1.1](https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates/Vicuna-v1.1.yaml) format for Ooobabooga's text generation webui.
27
+ ```
28
+ SYSTEM: Anda adalah asisten AI. Anda akan diberi tugas. Anda harus menghasilkan jawaban yang rinci dan panjang.
29
+ USER: <prompt> (without the <>)
30
+ ASSISTANT:
31
+ ```
32
+
33
+ ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
34
+
35
+ Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
36
+
37
+ It is strongly recommended to use the text-generation-webui one-click-installers unless you know how to make a manual install.
38
+
39
+ 1. Click the **Model tab**.
40
+ 2. Under **Download custom model or LoRA**, enter `asyafiqe/Merak-7B-v3-Mini-Orca-Indo-GPTQ`.
41
+ - To download from a specific branch, enter for example `asyafiqe/Merak-7B-v3-Mini-Orca-Indo-GPTQ`
42
+ 3. Click **Download**.
43
+ 4. The model will start downloading. Once it's finished it will say "Done"
44
+ 5. In the top left, click the refresh icon next to **Model**.
45
+ 6. In the **Model** dropdown, choose the model you just downloaded: `Merak-7B-v3-Mini-Orca-Indo-GPTQ`
46
+ 7. The model will automatically load, and is now ready for use!
47
+ 8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
48
+ * Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
49
+ 9. Once you're ready, click the **Text Generation tab** and enter a prompt to get started!
50
+
51
+ ## How to use this GPTQ model from Python code
52
+
53
+ First make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:
54
+
55
+ `GITHUB_ACTIONS=true pip install auto-gptq`
56
+
57
+ Then try the following example code:
58
+
59
+ ```python
60
+ from transformers import AutoTokenizer, pipeline, logging
61
+ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
62
+
63
+ model_name_or_path = "asyafiqe/Merak-7B-v3-Mini-Orca-Indo-GPTQ"
64
+ model_basename = "model"
65
+
66
+ use_triton = False
67
+
68
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
69
+
70
+ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
71
+ model_basename=model_basename,
72
+ use_safetensors=True,
73
+ trust_remote_code=True,
74
+ device="cuda:0",
75
+ use_triton=use_triton,
76
+ quantize_config=None)
77
+
78
+
79
+ prompt = "Tell me about AI"
80
+ system_message = "Anda adalah asisten AI. Anda akan diberi tugas. Anda harus menghasilkan jawaban yang rinci dan panjang.\n"
81
+ prompt_template=f'''SYSTEM: {system_message}
82
+ USER: {prompt}
83
+ ASSISTANT: '''
84
+
85
+ print("\n\n*** Generate:")
86
+
87
+ input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
88
+ output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
89
+ print(tokenizer.decode(output[0]))
90
+
91
+ # Inference can also be done using transformers' pipeline
92
+
93
+ # Prevent printing spurious transformers error when using pipeline with AutoGPTQ
94
+ logging.set_verbosity(logging.CRITICAL)
95
+
96
+ print("*** Pipeline:")
97
+ pipe = pipeline(
98
+ "text-generation",
99
+ model=model,
100
+ tokenizer=tokenizer,
101
+ max_new_tokens=512,
102
+ temperature=0.7,
103
+ top_p=0.95,
104
+ repetition_penalty=1.15
105
+ )
106
+
107
+ print(pipe(prompt_template)[0]['generated_text'])
108
+ ```
109
+
110
+ ## Compatibility
111
+
112
+ The files provided will work with AutoGPTQ (CUDA and Triton modes), GPTQ-for-LLaMa (only CUDA has been tested), and Occ4m's GPTQ-for-LLaMa fork.
113
+
114
+ ExLlama works with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility.
115
+
116
+ ## Credits
117
+ [TheBloke](https://huggingface.co/TheBloke/) for the Readme template.