Update README.md

81c2ca6 over 1 year ago

4.17 kB

	---
	inference: true
	---

	### NOTE:
	The PR [#1405](https://github.com/ggerganov/llama.cpp/pull/1405) brought breaking changes - none of the old models work with the latest build of llama.cpp.

	Pre-PR #1405 files have been marked as old but remain accessible for those who need them.

	Additionally, `q4_3` and `q4_2` have been completely axed in favor of their 5-bit counterparts (q5_1 and q5_0, respectively).

	New files inference up to 10% faster without any quality reduction.


	### Links
	- [7B version of this model](https://huggingface.co/eachadea/ggml-vicuna-7b-1.1)
	- [Set up with gpt4all-chat (one-click setup, available in in-app download menu)](https://gpt4all.io/index.html)
	- [Set up with llama.cpp](https://github.com/ggerganov/llama.cpp)
	- [Set up with oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md)

	### Info
	- Main files are based on v1.1 release
	- See changelog below
	- Use prompt template: ```HUMAN: <prompt> ASSISTANT: <response>```
	- Uncensored files are based on v0 release
	- Use prompt template: ```### User: <prompt> ### Assistant: <response>```
	- PR #896 was used for q4_0. Everything else is latest as of upload time.

	### Quantization
	Several quantization methods are supported. They differ in the resulting model disk size and inference speed.

	Model \| F16 \| Q4_0 \| Q4_1 \| Q4_2 \| Q4_3 \| Q5_0 \| Q5_1 \| Q8_0
	-- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| --
	7B (ppl) \| 5.9565 \| 6.2103 \| 6.1286 \| 6.1698 \| 6.0617 \| 6.0139 \| 5.9934 \| 5.9571
	7B (size) \| 13.0G \| 4.0G \| 4.8G \| 4.0G \| 4.8G \| 4.4G \| 4.8G \| 7.1G
	7B (ms/tok @ 4th) \| 128 \| 56 \| 61 \| 84 \| 91 \| 91 \| 95 \| 75
	7B (ms/tok @ 8th) \| 128 \| 47 \| 55 \| 48 \| 53 \| 53 \| 59 \| 75
	7B (bpw) \| 16.0 \| 5.0 \| 6.0 \| 5.0 \| 6.0 \| 5.5 \| 6.0 \| 9.0
	-- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| --
	13B (ppl) \| 5.2455 \| 5.3748 \| 5.3471 \| 5.3433 \| 5.3234 \| 5.2768 \| 5.2582 \| 5.2458
	13B (size) \| 25.0G \| 7.6G \| 9.1G \| 7.6G \| 9.1G \| 8.4G \| 9.1G \| 14G
	13B (ms/tok @ 4th) \| 239 \| 104 \| 113 \| 160 \| 175 \| 176 \| 185 \| 141
	13B (ms/tok @ 8th) \| 240 \| 85 \| 99 \| 97 \| 114 \| 108 \| 117 \| 147
	13B (bpw) \| 16.0 \| 5.0 \| 6.0 \| 5.0 \| 6.0 \| 5.5 \| 6.0 \| 9.0

	q5_1 or 5_0 are the latest and most performant implementations. The former is slightly more accurate at the cost of a bit of performance. Most users should use one of the two.
	If you encounter any kind of compatibility issues, you might want to try the older q4_x

	---

	# Vicuna Model Card

	## Model details

	Model type:
	Vicuna is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.
	It is an auto-regressive language model, based on the transformer architecture.

	Model date:
	Vicuna was trained between March 2023 and April 2023.

	Organizations developing the model:
	The Vicuna team with members from UC Berkeley, CMU, Stanford, and UC San Diego.

	Paper or resources for more information:
	https://vicuna.lmsys.org/

	License:
	Apache License 2.0

	Where to send questions or comments about the model:
	https://github.com/lm-sys/FastChat/issues

	## Intended use
	Primary intended uses:
	The primary use of Vicuna is research on large language models and chatbots.

	Primary intended users:
	The primary intended users of the model are researchers and hobbyists in natural language processing, machine learning, and artificial intelligence.

	## Training dataset
	70K conversations collected from ShareGPT.com.
	(48k for the uncensored variant. 22k worth of garbage removed – see https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered)

	## Evaluation dataset
	A preliminary evaluation of the model quality is conducted by creating a set of 80 diverse questions and utilizing GPT-4 to judge the model outputs. See https://vicuna.lmsys.org/ for more details.

	## Major updates of weights v1.1
	- Refactor the tokenization and separator. In Vicuna v1.1, the separator has been changed from `"###"` to the EOS token `"</s>"`. This change makes it easier to determine the generation stop criteria and enables better compatibility with other libraries.
	- Fix the supervised fine-tuning loss computation for better model quality.