TheBloke
/

CausalLM-7B-GGUF

@@ -47,6 +47,7 @@ tags:
 - llama2
 - qwen
 ---
 <!-- header start -->
 <!-- 200823 -->
@@ -132,10 +133,12 @@ These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwa
 They are also compatible with many third party UIs and libraries - please see the list at the top of this README.
 ## Explanation of quantisation methods
 <details>
   <summary>Click to see details</summary>
 The new methods available are:
 * GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
 * GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
 * GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
@@ -151,13 +154,13 @@ Refer to the Provided Files table below to see what files use which methods, and
 | Name | Quant method | Bits | Size | Max RAM required | Use case |
 | ---- | ---- | ---- | ---- | ---- | ----- |
-| [causallm_7b.Q2_K.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q2_K.gguf) | Q2_K | 2 | 3.39 GB| 5.89 GB | smallest, significant quality loss - not recommended for most purposes |
 | [causallm_7b.Q3_K_S.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q3_K_S.gguf) | Q3_K_S | 3 | 3.57 GB| 6.07 GB | very small, high quality loss |
 | [causallm_7b.Q3_K_M.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q3_K_M.gguf) | Q3_K_M | 3 | 3.92 GB| 6.42 GB | very small, high quality loss |
-| [causallm_7b.Q3_K_L.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q3_K_L.gguf) | Q3_K_L | 3 | 4.21 GB| 6.71 GB | small, substantial quality loss |
 | [causallm_7b.Q4_0.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q4_0.gguf) | Q4_0 | 4 | 4.51 GB| 7.01 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
 | [causallm_7b.Q4_K_S.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q4_K_S.gguf) | Q4_K_S | 4 | 4.54 GB| 7.04 GB | small, greater quality loss |
-| [causallm_7b.Q4_K_M.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q4_K_M.gguf) | Q4_K_M | 4 | 4.76 GB| 7.26 GB | medium, balanced quality - recommended |
 | [causallm_7b.Q5_0.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q5_0.gguf) | Q5_0 | 5 | 5.40 GB| 7.90 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
 | [causallm_7b.Q5_K_S.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q5_K_S.gguf) | Q5_K_S | 5 | 5.40 GB| 7.90 GB | large, low quality loss - recommended |
 | [causallm_7b.Q5_K_M.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q5_K_M.gguf) | Q5_K_M | 5 | 5.53 GB| 8.03 GB | large, very low quality loss - recommended |
@@ -176,9 +179,10 @@ Refer to the Provided Files table below to see what files use which methods, and
 **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file.
 The following clients/libraries will automatically download models for you, providing a list of available models to choose from:
-- LM Studio
-- LoLLMS Web UI
-- Faraday.dev
 ### In `text-generation-webui`
@@ -328,11 +332,21 @@ And thank you again to a16z for their generous grant.
 ![](https://huggingface.co/JosephusCheung/tmp/resolve/main/7.72b.png)
 ## Read Me:
 Also see [14B Version](https://huggingface.co/CausalLM/14B)
-This model was trained based on the model weights of Qwen and LLaMA2. The training process utilized a model structure that was identical to LLaMA2, using the same attention calculation method as the original MHA LLaMA2 models, and no additional scaling applied to the Relative Positional Encoding (RoPE).
 We manually curated a SFT dataset of 1.3B tokens for training, utilizing open source datasets from Hugging Face. For most of these sentences, we performed manual or synthetic rewrites and generated alternate language versions using larger language models. Additionally, we conducted augmented text training using carefully selected entries from Wikipedia, as well as featured entries from Fandom and filtered entries from Moegirlpedia. In order to strike a balance between efficiency and quality, 100% of the data used for training was synthetic data, no direct use of text from the internet or original texts from publicly available datasets was employed for fine-tuning.
@@ -357,7 +371,7 @@ other ACC: 70.04
 social ACC: 72.41
-**AVERAGE ACC:63.82**
 ## CEval (Val):
 STEM acc: 61.67
@@ -370,10 +384,62 @@ Other acc: 68.35
 Hard acc:48.03
-**AVERAGE acc:70.27**
 ## GSM8K
-**Zero-shot ACC 0.5921152388172858**
 <!-- original-model-card end -->

 - llama2
 - qwen
 ---
+<!-- markdownlint-disable MD041 -->
 <!-- header start -->
 <!-- 200823 -->
 They are also compatible with many third party UIs and libraries - please see the list at the top of this README.
 ## Explanation of quantisation methods
 <details>
   <summary>Click to see details</summary>
 The new methods available are:
 * GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
 * GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
 * GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
 | Name | Quant method | Bits | Size | Max RAM required | Use case |
 | ---- | ---- | ---- | ---- | ---- | ----- |
+| [causallm_7b.Q2_K.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q2_K.gguf) | Q2_K | 2 | 3.40 GB| 5.90 GB | smallest, significant quality loss - not recommended for most purposes |
 | [causallm_7b.Q3_K_S.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q3_K_S.gguf) | Q3_K_S | 3 | 3.57 GB| 6.07 GB | very small, high quality loss |
 | [causallm_7b.Q3_K_M.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q3_K_M.gguf) | Q3_K_M | 3 | 3.92 GB| 6.42 GB | very small, high quality loss |
+| [causallm_7b.Q3_K_L.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q3_K_L.gguf) | Q3_K_L | 3 | 4.22 GB| 6.72 GB | small, substantial quality loss |
 | [causallm_7b.Q4_0.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q4_0.gguf) | Q4_0 | 4 | 4.51 GB| 7.01 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
 | [causallm_7b.Q4_K_S.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q4_K_S.gguf) | Q4_K_S | 4 | 4.54 GB| 7.04 GB | small, greater quality loss |
+| [causallm_7b.Q4_K_M.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q4_K_M.gguf) | Q4_K_M | 4 | 4.77 GB| 7.27 GB | medium, balanced quality - recommended |
 | [causallm_7b.Q5_0.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q5_0.gguf) | Q5_0 | 5 | 5.40 GB| 7.90 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
 | [causallm_7b.Q5_K_S.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q5_K_S.gguf) | Q5_K_S | 5 | 5.40 GB| 7.90 GB | large, low quality loss - recommended |
 | [causallm_7b.Q5_K_M.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q5_K_M.gguf) | Q5_K_M | 5 | 5.53 GB| 8.03 GB | large, very low quality loss - recommended |
 **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file.
 The following clients/libraries will automatically download models for you, providing a list of available models to choose from:
+* LM Studio
+* LoLLMS Web UI
+* Faraday.dev
 ### In `text-generation-webui`
 ![](https://huggingface.co/JosephusCheung/tmp/resolve/main/7.72b.png)
+*Image drawn by GPT-4 DALL·E 3* TL;DR: Perhaps this 7B model, better than all existing models <= 33B, in most quantitative evaluations...
+# Please Stop Using WRONG unofficial quant models unless you know what you're doing
+GPTQ quants require a good dataset for calibration, and the default C4 dataset is not capable.
+**llama.cpp GGUF models**
+GPT2Tokenizer fixed by [Kerfuffle](https://github.com/KerfuffleV2) on [https://github.com/ggerganov/llama.cpp/pull/3743](https://github.com/ggerganov/llama.cpp/pull/3743), new models to be reuploaded.
 ## Read Me:
 Also see [14B Version](https://huggingface.co/CausalLM/14B)
+This model was trained based on the model weights of Qwen (and LLaMA2 was used, yes, for calculating some initial weights), you may also need to comply with the commercial use restrictions of these two models depending on the situation. The training process utilized a model structure that was identical to LLaMA2, using the same attention calculation method as the original MHA LLaMA2 models, and no additional scaling applied to the Relative Positional Encoding (RoPE).
 We manually curated a SFT dataset of 1.3B tokens for training, utilizing open source datasets from Hugging Face. For most of these sentences, we performed manual or synthetic rewrites and generated alternate language versions using larger language models. Additionally, we conducted augmented text training using carefully selected entries from Wikipedia, as well as featured entries from Fandom and filtered entries from Moegirlpedia. In order to strike a balance between efficiency and quality, 100% of the data used for training was synthetic data, no direct use of text from the internet or original texts from publicly available datasets was employed for fine-tuning.
 social ACC: 72.41
+**AVERAGE ACC:63.82** (Outperforms / Equal to the best Mistral-7B Chat-style fine-tunes, and ALL other models under 33B.)
 ## CEval (Val):
 STEM acc: 61.67
 Hard acc:48.03
+**AVERAGE acc:70.27** (Outperforms ALL 7B models currently.)
+## GSM8K
+**Zero-shot ACC 0.5921152388172858** (Outperforms WizardMath-7B and Qwen-7B)
+**llama.cpp GGUF models**
+GPT2Tokenizer 支持由 [Kerfuffle](https://github.com/KerfuffleV2) 修复于 [https://github.com/ggerganov/llama.cpp/pull/3743](https://github.com/ggerganov/llama.cpp/pull/3743)，新模型稍后上传。
+## 请读我：
+另请参阅[14B版本](https://huggingface.co/CausalLM/14B)
+该模型是基于Qwen的权重（并使用了LLaMA2权重，是的，用于计算一些权重初始化），您根据情况可能还需要遵守这两个模型的商业使用限制。训练过程中使用了与LLaMA2相同的模型结构，使用原始MHA LLaMA2模型的相同注意力计算方法，对相对位置编码（RoPE）没有进行额外的缩放。
+我们手动筛选了一个包含13亿个标记的SFT数据集进行训练，利用了Hugging Face的开源数据集。对于大多数句子，我们进行了手动或合成改写，并使用更大的语言模型生成了其他语言版本。此外，我们还使用了精心挑选的来自维基百科的条目、来自Fandom的精选条目以及来自萌娘百科的过滤条目进行增强文本训练。为了在效率和质量之间取得平衡，训练所使用的100%数据都是合成数据，没有直接使用来自互联网或公开可用数据集的原始文本进行微调。
+7B版本的模型是14B模型的精简版本，专门设计用于推测抽样。因此，在直接使用模型时，需要谨慎行事，因为它可能会产生幻觉或不可靠的输出。
+请注意，模型是在未经过滤的互联网数据上进行训练的。由于我们无法审核所有数据，可能会出现大量不良内容、色情、暴力和冒犯性语言，我们无法删除这些内容。因此，您仍然需要对模型的安全性进行自己的检查，并对输出中的关键词进行过滤。由于计算资源的限制，我们目前无法为模型的伦理和安全实施RLHF，也无法对拒绝回答某些问题的SFT样本进行训练以进行限制性微调。
+额外奖励：模型在LLaVA1.5中引入的提示格式上进行了一些微调，与图像注意力计算无关。因此，将ViT投影模块与冻结的LM对齐，并根据视觉指令实施快速实现有效的多模态能力。
+## 提示格式：
+[chatml](https://github.com/openai/openai-python/blob/main/chatml.md)
+**系统提示不能为空！**
+## MMLU：
+STEM准确率：56.83
+人文学科准确率：58.79
+其他准确率：70.04
+社会学准确率：72.41
+**平均准确率：63.82** （优于/平于最好的 Mistral-7B 聊天格式的微调，和其余的33B及以下模型。）
+## CEval（验证集）：
+STEM准确率：61.67
+社会科学准确率：81.94
+人文学科准确率：77.19
+其他准确率：68.35
+困难准确率：48.03
+**平均准确率：70.27** （优于当前所有7B模型。）
 ## GSM8K
+**零样本准确率0.5921152388172858** （优于WizardMath-7B和Qwen-7B）
 <!-- original-model-card end -->