File size: 5,586 Bytes
bf87f7c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
---
base_model: GeneZC/MiniChat-2-3B
inference: false
language:
- en
- zh
library_name: transformers
license: apache-2.0
model_creator: GeneZC
model_name: MiniChat-2-3B
pipeline_tag: text-generation
quantized_by: afrideva
tags:
- gguf
- ggml
- quantized
- q2_k
- q3_k_m
- q4_k_m
- q5_k_m
- q6_k
- q8_0
widget:
- text: "<s> [|User|] Hi \U0001F44B </s>[|Assistant|]"
---
# GeneZC/MiniChat-2-3B-GGUF
Quantized GGUF model files for [MiniChat-2-3B](https://huggingface.co/GeneZC/MiniChat-2-3B) from [GeneZC](https://huggingface.co/GeneZC)
| Name | Quant method | Size |
| ---- | ---- | ---- |
| [minichat-2-3b.fp16.gguf](https://huggingface.co/afrideva/MiniChat-2-3B-GGUF/resolve/main/minichat-2-3b.fp16.gguf) | fp16 | 6.04 GB |
| [minichat-2-3b.q2_k.gguf](https://huggingface.co/afrideva/MiniChat-2-3B-GGUF/resolve/main/minichat-2-3b.q2_k.gguf) | q2_k | 1.30 GB |
| [minichat-2-3b.q3_k_m.gguf](https://huggingface.co/afrideva/MiniChat-2-3B-GGUF/resolve/main/minichat-2-3b.q3_k_m.gguf) | q3_k_m | 1.51 GB |
| [minichat-2-3b.q4_k_m.gguf](https://huggingface.co/afrideva/MiniChat-2-3B-GGUF/resolve/main/minichat-2-3b.q4_k_m.gguf) | q4_k_m | 1.85 GB |
| [minichat-2-3b.q5_k_m.gguf](https://huggingface.co/afrideva/MiniChat-2-3B-GGUF/resolve/main/minichat-2-3b.q5_k_m.gguf) | q5_k_m | 2.15 GB |
| [minichat-2-3b.q6_k.gguf](https://huggingface.co/afrideva/MiniChat-2-3B-GGUF/resolve/main/minichat-2-3b.q6_k.gguf) | q6_k | 2.48 GB |
| [minichat-2-3b.q8_0.gguf](https://huggingface.co/afrideva/MiniChat-2-3B-GGUF/resolve/main/minichat-2-3b.q8_0.gguf) | q8_0 | 3.21 GB |
## Original Model Card:
## MiniChat-2-3B
π [arXiv](https://arxiv.org/abs/2311.07052) | π» [GitHub](https://github.com/GeneZC/MiniMA) | π€ [HuggingFace-MiniMA](https://huggingface.co/GeneZC/MiniMA-3B) | π€ [HuggingFace-MiniChat](https://huggingface.co/GeneZC/MiniChat-3B) | π€ [ModelScope-MiniMA](https://modelscope.cn/models/GeneZC/MiniMA-3B) | π€ [ModelScope-MiniChat](https://modelscope.cn/models/GeneZC/MiniChat-3B) | π€ [HuggingFace-MiniChat-1.5](https://huggingface.co/GeneZC/MiniChat-1.5-3B) | π€ [HuggingFace-MiniMA-2](https://huggingface.co/GeneZC/MiniMA-2-3B) | π€ [HuggingFace-MiniChat-2](https://huggingface.co/GeneZC/MiniChat-2-3B)
π **Updates from MiniChat-3B**:
- better base model MiniMA-2-3B;
- better data mixture;
- use of [NEFTune](https://arxiv.org/abs/2310.05914);
- use of [DPO](https://arxiv.org/abs/2305.18290).
β Must comply with LICENSE of LLaMA2 since it is derived from LLaMA2.
A language model continued from MiniMA-3B and finetuned on both instruction and preference data.
Surpassing Vicuna-7B and approximating LLaMA-2-Chat-7B on MT-Bench.
<img src="./teaser_b.jpg" alt="teaser_b" width="687" />
**Standard Benchmarks**
|Method|TFLOPs|MMLU (5-shot)|CEval (5-shot)|DROP (3-shot)|HumanEval (0-shot)|BBH (3-shot)|GSM8K (8-shot)|
|--|--|--|--|--|--|--|--|
|Mamba-2.8B|4.6E9|25.58|24.74|15.72|7.32|29.37|3.49|
|ShearedLLaMA-2.7B|0.8E9|26.97|22.88|19.98|4.88|30.48|3.56|
|BTLM-3B|11.3E9|27.20|26.00|17.84|10.98|30.87|4.55|
|StableLM-3B|72.0E9|44.75|31.05|22.35|15.85|32.59|10.99|
|Qwen-1.8B|23.8E9|44.05|54.75|12.97|14.02|30.80|22.97|
|Phi-2-2.8B|159.9E9|56.74|34.03|30.74|46.95|44.13|55.42|
|LLaMA-2-7B|84.0E9|46.00|34.40|31.57|12.80|32.02|14.10|
||
|MiniMA-3B|4.0E9|28.51|28.23|22.50|10.98|31.61|8.11|
|MiniChat-3B|4.0E9|38.40|36.48|22.58|18.29|31.36|29.72|
|MiniMA-2-3B|13.4E9|40.14|44.65|23.10|14.63|31.43|8.87|
|MiniChat-2-3B|13.4E9|46.17|43.91|30.26|22.56|34.95|38.13|
**Instruction-following Benchmarks**
|Method|AlpacaEval|MT-Bench|
|--|--|--|
|GPT-4|95.28|9.18|
|Zephyr-7B-Beta|90.60|7.34|
|Phi-2-DPO|81.37|-|
|StableLM Zephyr 3B|76.00|6.64|
|Vicuna-7B|76.84|6.17|
|LLaMA-2-Chat-7B|71.37|6.27|
||
|MiniChat-3B|48.82|-|
|MiniChat-2-3B|77.30|6.23|
The following is an example code snippet to use MiniChat-2-3B:
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from conversation import get_default_conv_template
# MiniChat
tokenizer = AutoTokenizer.from_pretrained("GeneZC/MiniChat-2-3B", use_fast=False)
# GPU.
model = AutoModelForCausalLM.from_pretrained("GeneZC/MiniChat-2-3B", use_cache=True, device_map="auto", torch_dtype=torch.float16).eval()
# CPU.
# model = AutoModelForCausalLM.from_pretrained("GeneZC/MiniChat-2-3B", use_cache=True, device_map="cpu", torch_dtype=torch.float16).eval()
conv = get_default_conv_template("minichat")
question = "Implement a program to find the common elements in two arrays without using any extra data structures."
conv.append_message(conv.roles[0], question)
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()
input_ids = tokenizer([prompt]).input_ids
output_ids = model.generate(
torch.as_tensor(input_ids).cuda(),
do_sample=True,
temperature=0.7,
max_new_tokens=1024,
)
output_ids = output_ids[0][len(input_ids[0]):]
output = tokenizer.decode(output_ids, skip_special_tokens=True).strip()
# output: "def common_elements(arr1, arr2):\n if len(arr1) == 0:\n return []\n if len(arr2) == 0:\n return arr1\n\n common_elements = []\n for element in arr1:\n if element in arr2:\n common_elements.append(element)\n\n return common_elements"
# Multiturn conversation could be realized by continuously appending questions to `conv`.
```
## Bibtex
```bibtex
@article{zhang2023law,
title={Towards the Law of Capacity Gap in Distilling Language Models},
author={Zhang, Chen and Song, Dawei and Ye, Zheyu and Gao, Yan},
year={2023},
url={https://arxiv.org/abs/2311.07052}
}
``` |