TheBloke commited on
Commit
633b64f
1 Parent(s): 08a0822

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -10
README.md CHANGED
@@ -47,6 +47,7 @@ tags:
47
  - llama2
48
  - qwen
49
  ---
 
50
 
51
  <!-- header start -->
52
  <!-- 200823 -->
@@ -132,10 +133,12 @@ These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwa
132
  They are also compatible with many third party UIs and libraries - please see the list at the top of this README.
133
 
134
  ## Explanation of quantisation methods
 
135
  <details>
136
  <summary>Click to see details</summary>
137
 
138
  The new methods available are:
 
139
  * GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
140
  * GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
141
  * GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
@@ -151,13 +154,13 @@ Refer to the Provided Files table below to see what files use which methods, and
151
 
152
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
153
  | ---- | ---- | ---- | ---- | ---- | ----- |
154
- | [causallm_7b.Q2_K.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q2_K.gguf) | Q2_K | 2 | 3.39 GB| 5.89 GB | smallest, significant quality loss - not recommended for most purposes |
155
  | [causallm_7b.Q3_K_S.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q3_K_S.gguf) | Q3_K_S | 3 | 3.57 GB| 6.07 GB | very small, high quality loss |
156
  | [causallm_7b.Q3_K_M.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q3_K_M.gguf) | Q3_K_M | 3 | 3.92 GB| 6.42 GB | very small, high quality loss |
157
- | [causallm_7b.Q3_K_L.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q3_K_L.gguf) | Q3_K_L | 3 | 4.21 GB| 6.71 GB | small, substantial quality loss |
158
  | [causallm_7b.Q4_0.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q4_0.gguf) | Q4_0 | 4 | 4.51 GB| 7.01 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
159
  | [causallm_7b.Q4_K_S.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q4_K_S.gguf) | Q4_K_S | 4 | 4.54 GB| 7.04 GB | small, greater quality loss |
160
- | [causallm_7b.Q4_K_M.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q4_K_M.gguf) | Q4_K_M | 4 | 4.76 GB| 7.26 GB | medium, balanced quality - recommended |
161
  | [causallm_7b.Q5_0.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q5_0.gguf) | Q5_0 | 5 | 5.40 GB| 7.90 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
162
  | [causallm_7b.Q5_K_S.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q5_K_S.gguf) | Q5_K_S | 5 | 5.40 GB| 7.90 GB | large, low quality loss - recommended |
163
  | [causallm_7b.Q5_K_M.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q5_K_M.gguf) | Q5_K_M | 5 | 5.53 GB| 8.03 GB | large, very low quality loss - recommended |
@@ -176,9 +179,10 @@ Refer to the Provided Files table below to see what files use which methods, and
176
  **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file.
177
 
178
  The following clients/libraries will automatically download models for you, providing a list of available models to choose from:
179
- - LM Studio
180
- - LoLLMS Web UI
181
- - Faraday.dev
 
182
 
183
  ### In `text-generation-webui`
184
 
@@ -328,11 +332,21 @@ And thank you again to a16z for their generous grant.
328
 
329
  ![](https://huggingface.co/JosephusCheung/tmp/resolve/main/7.72b.png)
330
 
 
 
 
 
 
 
 
 
 
 
331
  ## Read Me:
332
 
333
  Also see [14B Version](https://huggingface.co/CausalLM/14B)
334
 
335
- This model was trained based on the model weights of Qwen and LLaMA2. The training process utilized a model structure that was identical to LLaMA2, using the same attention calculation method as the original MHA LLaMA2 models, and no additional scaling applied to the Relative Positional Encoding (RoPE).
336
 
337
  We manually curated a SFT dataset of 1.3B tokens for training, utilizing open source datasets from Hugging Face. For most of these sentences, we performed manual or synthetic rewrites and generated alternate language versions using larger language models. Additionally, we conducted augmented text training using carefully selected entries from Wikipedia, as well as featured entries from Fandom and filtered entries from Moegirlpedia. In order to strike a balance between efficiency and quality, 100% of the data used for training was synthetic data, no direct use of text from the internet or original texts from publicly available datasets was employed for fine-tuning.
338
 
@@ -357,7 +371,7 @@ other ACC: 70.04
357
 
358
  social ACC: 72.41
359
 
360
- **AVERAGE ACC:63.82**
361
 
362
  ## CEval (Val):
363
  STEM acc: 61.67
@@ -370,10 +384,62 @@ Other acc: 68.35
370
 
371
  Hard acc:48.03
372
 
373
- **AVERAGE acc:70.27**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
374
 
375
  ## GSM8K
376
 
377
- **Zero-shot ACC 0.5921152388172858**
378
 
379
  <!-- original-model-card end -->
 
47
  - llama2
48
  - qwen
49
  ---
50
+ <!-- markdownlint-disable MD041 -->
51
 
52
  <!-- header start -->
53
  <!-- 200823 -->
 
133
  They are also compatible with many third party UIs and libraries - please see the list at the top of this README.
134
 
135
  ## Explanation of quantisation methods
136
+
137
  <details>
138
  <summary>Click to see details</summary>
139
 
140
  The new methods available are:
141
+
142
  * GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
143
  * GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
144
  * GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
 
154
 
155
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
156
  | ---- | ---- | ---- | ---- | ---- | ----- |
157
+ | [causallm_7b.Q2_K.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q2_K.gguf) | Q2_K | 2 | 3.40 GB| 5.90 GB | smallest, significant quality loss - not recommended for most purposes |
158
  | [causallm_7b.Q3_K_S.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q3_K_S.gguf) | Q3_K_S | 3 | 3.57 GB| 6.07 GB | very small, high quality loss |
159
  | [causallm_7b.Q3_K_M.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q3_K_M.gguf) | Q3_K_M | 3 | 3.92 GB| 6.42 GB | very small, high quality loss |
160
+ | [causallm_7b.Q3_K_L.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q3_K_L.gguf) | Q3_K_L | 3 | 4.22 GB| 6.72 GB | small, substantial quality loss |
161
  | [causallm_7b.Q4_0.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q4_0.gguf) | Q4_0 | 4 | 4.51 GB| 7.01 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
162
  | [causallm_7b.Q4_K_S.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q4_K_S.gguf) | Q4_K_S | 4 | 4.54 GB| 7.04 GB | small, greater quality loss |
163
+ | [causallm_7b.Q4_K_M.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q4_K_M.gguf) | Q4_K_M | 4 | 4.77 GB| 7.27 GB | medium, balanced quality - recommended |
164
  | [causallm_7b.Q5_0.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q5_0.gguf) | Q5_0 | 5 | 5.40 GB| 7.90 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
165
  | [causallm_7b.Q5_K_S.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q5_K_S.gguf) | Q5_K_S | 5 | 5.40 GB| 7.90 GB | large, low quality loss - recommended |
166
  | [causallm_7b.Q5_K_M.gguf](https://huggingface.co/TheBloke/CausalLM-7B-GGUF/blob/main/causallm_7b.Q5_K_M.gguf) | Q5_K_M | 5 | 5.53 GB| 8.03 GB | large, very low quality loss - recommended |
 
179
  **Note for manual downloaders:** You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file.
180
 
181
  The following clients/libraries will automatically download models for you, providing a list of available models to choose from:
182
+
183
+ * LM Studio
184
+ * LoLLMS Web UI
185
+ * Faraday.dev
186
 
187
  ### In `text-generation-webui`
188
 
 
332
 
333
  ![](https://huggingface.co/JosephusCheung/tmp/resolve/main/7.72b.png)
334
 
335
+ *Image drawn by GPT-4 DALL·E 3* TL;DR: Perhaps this 7B model, better than all existing models <= 33B, in most quantitative evaluations...
336
+
337
+ # Please Stop Using WRONG unofficial quant models unless you know what you're doing
338
+
339
+ GPTQ quants require a good dataset for calibration, and the default C4 dataset is not capable.
340
+
341
+ **llama.cpp GGUF models**
342
+ GPT2Tokenizer fixed by [Kerfuffle](https://github.com/KerfuffleV2) on [https://github.com/ggerganov/llama.cpp/pull/3743](https://github.com/ggerganov/llama.cpp/pull/3743), new models to be reuploaded.
343
+
344
+
345
  ## Read Me:
346
 
347
  Also see [14B Version](https://huggingface.co/CausalLM/14B)
348
 
349
+ This model was trained based on the model weights of Qwen (and LLaMA2 was used, yes, for calculating some initial weights), you may also need to comply with the commercial use restrictions of these two models depending on the situation. The training process utilized a model structure that was identical to LLaMA2, using the same attention calculation method as the original MHA LLaMA2 models, and no additional scaling applied to the Relative Positional Encoding (RoPE).
350
 
351
  We manually curated a SFT dataset of 1.3B tokens for training, utilizing open source datasets from Hugging Face. For most of these sentences, we performed manual or synthetic rewrites and generated alternate language versions using larger language models. Additionally, we conducted augmented text training using carefully selected entries from Wikipedia, as well as featured entries from Fandom and filtered entries from Moegirlpedia. In order to strike a balance between efficiency and quality, 100% of the data used for training was synthetic data, no direct use of text from the internet or original texts from publicly available datasets was employed for fine-tuning.
352
 
 
371
 
372
  social ACC: 72.41
373
 
374
+ **AVERAGE ACC:63.82** (Outperforms / Equal to the best Mistral-7B Chat-style fine-tunes, and ALL other models under 33B.)
375
 
376
  ## CEval (Val):
377
  STEM acc: 61.67
 
384
 
385
  Hard acc:48.03
386
 
387
+ **AVERAGE acc:70.27** (Outperforms ALL 7B models currently.)
388
+
389
+ ## GSM8K
390
+
391
+ **Zero-shot ACC 0.5921152388172858** (Outperforms WizardMath-7B and Qwen-7B)
392
+
393
+
394
+ **llama.cpp GGUF models**
395
+ GPT2Tokenizer 支持由 [Kerfuffle](https://github.com/KerfuffleV2) 修复于 [https://github.com/ggerganov/llama.cpp/pull/3743](https://github.com/ggerganov/llama.cpp/pull/3743),新模型稍后上传。
396
+
397
+ ## 请读我:
398
+
399
+ 另请参阅[14B版本](https://huggingface.co/CausalLM/14B)
400
+
401
+ 该模型是基于Qwen的权重(并使用了LLaMA2权重,是的,用于计算一些权重初始化),您根据情况可能还需要遵守这两个模型的商业使用限制。训练过程中使用了与LLaMA2相同的模型结构,使用原始MHA LLaMA2模型的相同注意力计算方法,对相对位置编码(RoPE)没有进行额外的缩放。
402
+
403
+ 我们手动筛选了一个包含13亿个标记的SFT数据集进行训练,利用了Hugging Face的开源数据集。对于大多数句子,我们进行了手动或合成改写,并使用更大的语言模型生成了其他语言版本。此外,我们还使用了精心挑选的来自维基百科的条目、来自Fandom的精选条目以及来自萌娘百科的过滤条目进行增强文本训练。为了在效率和质量之间取得平衡,训练所使用的100%数据都是合成数据,没有直接使用来自互联网或公开可用数据集的原始文本进行微调。
404
+
405
+ 7B版本的模型是14B模型的精简版本,专门设计用于推测抽样。因此,在直接使用模型时,需要谨慎行事,因为它可能会产生幻觉或不可靠的输出。
406
+
407
+ 请注意,模型是在未经过滤的互联网数据上进行训练的。由于我们无法审核所有数据,可能会出现大量不良内容、色情、暴力和冒犯性语言,我们无法删除这些内容。因此,您仍然需要对模型的安全性进行自己的检查,并对输出中的关键词进行过滤。由于计算资源的限制,我们目前无法为模型的伦理和安全实施RLHF,也无法对拒绝回答某些问题的SFT样本进行训练以进行限制性微调。
408
+
409
+ 额外奖励:模型在LLaVA1.5中引入的提示格式上进行了一些微调,与图像注意力计算无关。因此,将ViT投影模块与冻结的LM对齐,并根据视觉指令实施快速实现有效的多模态能力。
410
+
411
+ ## 提示格式:
412
+ [chatml](https://github.com/openai/openai-python/blob/main/chatml.md)
413
+
414
+ **系统提示不能为空!**
415
+
416
+
417
+ ## MMLU:
418
+ STEM准确率:56.83
419
+
420
+ 人文学科准确率:58.79
421
+
422
+ 其他准确率:70.04
423
+
424
+ 社会学准确率:72.41
425
+
426
+ **平均准确率:63.82** (优于/平于最好的 Mistral-7B 聊天格式的微调,和其余的33B及以下模型。)
427
+
428
+ ## CEval(验证集):
429
+ STEM准确率:61.67
430
+
431
+ 社会科学准确率:81.94
432
+
433
+ 人文学科准确率:77.19
434
+
435
+ 其他准确率:68.35
436
+
437
+ 困难准确率:48.03
438
+
439
+ **平均准确率:70.27** (优于当前所有7B模型。)
440
 
441
  ## GSM8K
442
 
443
+ **零样本准确率0.5921152388172858** (优于WizardMath-7B和Qwen-7B)
444
 
445
  <!-- original-model-card end -->