File size: 5,586 Bytes
bf87f7c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
base_model: GeneZC/MiniChat-2-3B
inference: false
language:
- en
- zh
library_name: transformers
license: apache-2.0
model_creator: GeneZC
model_name: MiniChat-2-3B
pipeline_tag: text-generation
quantized_by: afrideva
tags:
- gguf
- ggml
- quantized
- q2_k
- q3_k_m
- q4_k_m
- q5_k_m
- q6_k
- q8_0
widget:
- text: "<s> [|User|] Hi \U0001F44B  </s>[|Assistant|]"
---
# GeneZC/MiniChat-2-3B-GGUF

Quantized GGUF model files for [MiniChat-2-3B](https://huggingface.co/GeneZC/MiniChat-2-3B) from [GeneZC](https://huggingface.co/GeneZC)


| Name | Quant method | Size |
| ---- | ---- | ---- |
| [minichat-2-3b.fp16.gguf](https://huggingface.co/afrideva/MiniChat-2-3B-GGUF/resolve/main/minichat-2-3b.fp16.gguf) | fp16 | 6.04 GB  |
| [minichat-2-3b.q2_k.gguf](https://huggingface.co/afrideva/MiniChat-2-3B-GGUF/resolve/main/minichat-2-3b.q2_k.gguf) | q2_k | 1.30 GB  |
| [minichat-2-3b.q3_k_m.gguf](https://huggingface.co/afrideva/MiniChat-2-3B-GGUF/resolve/main/minichat-2-3b.q3_k_m.gguf) | q3_k_m | 1.51 GB  |
| [minichat-2-3b.q4_k_m.gguf](https://huggingface.co/afrideva/MiniChat-2-3B-GGUF/resolve/main/minichat-2-3b.q4_k_m.gguf) | q4_k_m | 1.85 GB  |
| [minichat-2-3b.q5_k_m.gguf](https://huggingface.co/afrideva/MiniChat-2-3B-GGUF/resolve/main/minichat-2-3b.q5_k_m.gguf) | q5_k_m | 2.15 GB  |
| [minichat-2-3b.q6_k.gguf](https://huggingface.co/afrideva/MiniChat-2-3B-GGUF/resolve/main/minichat-2-3b.q6_k.gguf) | q6_k | 2.48 GB  |
| [minichat-2-3b.q8_0.gguf](https://huggingface.co/afrideva/MiniChat-2-3B-GGUF/resolve/main/minichat-2-3b.q8_0.gguf) | q8_0 | 3.21 GB  |



## Original Model Card:
## MiniChat-2-3B

πŸ“‘ [arXiv](https://arxiv.org/abs/2311.07052) | πŸ‘» [GitHub](https://github.com/GeneZC/MiniMA) | πŸ€— [HuggingFace-MiniMA](https://huggingface.co/GeneZC/MiniMA-3B) | πŸ€— [HuggingFace-MiniChat](https://huggingface.co/GeneZC/MiniChat-3B) | πŸ€– [ModelScope-MiniMA](https://modelscope.cn/models/GeneZC/MiniMA-3B) | πŸ€– [ModelScope-MiniChat](https://modelscope.cn/models/GeneZC/MiniChat-3B) | πŸ€— [HuggingFace-MiniChat-1.5](https://huggingface.co/GeneZC/MiniChat-1.5-3B) | πŸ€— [HuggingFace-MiniMA-2](https://huggingface.co/GeneZC/MiniMA-2-3B) | πŸ€— [HuggingFace-MiniChat-2](https://huggingface.co/GeneZC/MiniChat-2-3B)

πŸ†• **Updates from MiniChat-3B**: 
- better base model MiniMA-2-3B;
- better data mixture;
- use of [NEFTune](https://arxiv.org/abs/2310.05914);
- use of [DPO](https://arxiv.org/abs/2305.18290).

❗ Must comply with LICENSE of LLaMA2 since it is derived from LLaMA2.

A language model continued from MiniMA-3B and finetuned on both instruction and preference data.

Surpassing Vicuna-7B and approximating LLaMA-2-Chat-7B on MT-Bench.

<img src="./teaser_b.jpg" alt="teaser_b" width="687" />

**Standard Benchmarks**

|Method|TFLOPs|MMLU (5-shot)|CEval (5-shot)|DROP (3-shot)|HumanEval (0-shot)|BBH (3-shot)|GSM8K (8-shot)|
|--|--|--|--|--|--|--|--|
|Mamba-2.8B|4.6E9|25.58|24.74|15.72|7.32|29.37|3.49|
|ShearedLLaMA-2.7B|0.8E9|26.97|22.88|19.98|4.88|30.48|3.56|
|BTLM-3B|11.3E9|27.20|26.00|17.84|10.98|30.87|4.55|
|StableLM-3B|72.0E9|44.75|31.05|22.35|15.85|32.59|10.99|
|Qwen-1.8B|23.8E9|44.05|54.75|12.97|14.02|30.80|22.97|
|Phi-2-2.8B|159.9E9|56.74|34.03|30.74|46.95|44.13|55.42|
|LLaMA-2-7B|84.0E9|46.00|34.40|31.57|12.80|32.02|14.10|
||
|MiniMA-3B|4.0E9|28.51|28.23|22.50|10.98|31.61|8.11|
|MiniChat-3B|4.0E9|38.40|36.48|22.58|18.29|31.36|29.72|
|MiniMA-2-3B|13.4E9|40.14|44.65|23.10|14.63|31.43|8.87|
|MiniChat-2-3B|13.4E9|46.17|43.91|30.26|22.56|34.95|38.13|

**Instruction-following Benchmarks**

|Method|AlpacaEval|MT-Bench|
|--|--|--|
|GPT-4|95.28|9.18|
|Zephyr-7B-Beta|90.60|7.34|
|Phi-2-DPO|81.37|-|
|StableLM Zephyr 3B|76.00|6.64|
|Vicuna-7B|76.84|6.17|
|LLaMA-2-Chat-7B|71.37|6.27|
||
|MiniChat-3B|48.82|-|
|MiniChat-2-3B|77.30|6.23|

The following is an example code snippet to use MiniChat-2-3B:

```python
import torch

from transformers import AutoModelForCausalLM, AutoTokenizer

from conversation import get_default_conv_template

# MiniChat
tokenizer = AutoTokenizer.from_pretrained("GeneZC/MiniChat-2-3B", use_fast=False)
# GPU.
model = AutoModelForCausalLM.from_pretrained("GeneZC/MiniChat-2-3B", use_cache=True, device_map="auto", torch_dtype=torch.float16).eval()
# CPU.
# model = AutoModelForCausalLM.from_pretrained("GeneZC/MiniChat-2-3B", use_cache=True, device_map="cpu", torch_dtype=torch.float16).eval()

conv = get_default_conv_template("minichat")

question = "Implement a program to find the common elements in two arrays without using any extra data structures."
conv.append_message(conv.roles[0], question)
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()
input_ids = tokenizer([prompt]).input_ids
output_ids = model.generate(
    torch.as_tensor(input_ids).cuda(),
    do_sample=True,
    temperature=0.7,
    max_new_tokens=1024,
)
output_ids = output_ids[0][len(input_ids[0]):]
output = tokenizer.decode(output_ids, skip_special_tokens=True).strip()
# output: "def common_elements(arr1, arr2):\n    if len(arr1) == 0:\n        return []\n    if len(arr2) == 0:\n        return arr1\n\n    common_elements = []\n    for element in arr1:\n        if element in arr2:\n            common_elements.append(element)\n\n    return common_elements"
# Multiturn conversation could be realized by continuously appending questions to `conv`.
```

## Bibtex

```bibtex
@article{zhang2023law,
    title={Towards the Law of Capacity Gap in Distilling Language Models},
    author={Zhang, Chen and Song, Dawei and Ye, Zheyu and Gao, Yan},
    year={2023},
    url={https://arxiv.org/abs/2311.07052}
}
```