wac81 commited on
Commit
c207ed1
1 Parent(s): 1efefa5

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +35 -17
README.md CHANGED
@@ -1,23 +1,27 @@
1
  ## 介绍 (Introduction)
2
- retnet-1.3B-toy 是一个开源模型。
3
- 1.根据retnet论文([https://arxiv.org/pdf/2307.08621.pdf](https://arxiv.org/pdf/2307.08621.pdf))开发并基于transformer文本生成模型。该仓库的算法实现根据repo进行([https://github.com/syncdoth/RetNet.git](https://github.com/syncdoth/RetNet.git))
4
- 2.该仓库目标是建议一个retnet基础训练仓库,建议做学习研究使用,不建议商用。
5
- 2.该仓库只使用wiki文本和少量sharegpt/belle/多轮指令数据集训练而成。包含中英文数据,数据估算占比7:3。
6
- 3.本次放出pretrain模型与sft微调后模型。
7
- 4.已知问题:
8
- -会出现重复句子回答,可以调节topk减轻该问题。
9
- -会出现回答不全问题,可以提高max_new_token缓解该问题。
10
- -由于知识储备不足,回答准确性一般。
 
 
 
11
 
12
  retnet-1.3B-toy is an open source model.
13
  1. Developed according to retnet paper ([https://arxiv.org/pdf/2307.08621.pdf](https://arxiv.org/pdf/2307.08621.pdf)) and based on transformer text generation model. The algorithmic implementation of this repository is carried out according to repo ([https://github.com/syncdoth/RetNet.git](https://github.com/syncdoth/RetNet.git))
14
  2. The goal of this repository is to suggest a retnet base training repository, which is recommended to be used for learning research and not for commercial use.
15
- 2. This repository is trained using only wiki text and a small amount of sharegpt/belle instruction dataset.
16
- 3. This release pretrain model with sft fine-tuned model.
17
- 4. known issues:
18
- -Repeated sentence answers will occur, topk can be adjusted to mitigate the problem.
19
- -Incomplete answers will occur, you can increase max_new_token to alleviate the problem.
20
- -Answer accuracy is average due to insufficient knowledge base.
 
21
 
22
  ## 软件依赖 (Dependencies)
23
 
@@ -25,11 +29,25 @@ retnet-1.3B-toy is an open source model.
25
  pip install torch transformers
26
  ```
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ## 代码调用 (Code Usage)
29
 
30
- 可以通过如下代码调用 retnet-1.3B-toy 模型来生成对话:
31
 
32
- You can generate dialogue by invoking the retnet-1.3B-toy model with the following code:
33
 
34
  python generate.py
35
 
 
1
  ## 介绍 (Introduction)
2
+ retnet-1.3B-toy 是一个开源模型。主要是为探索模型小型化,测试小数据量训练的最佳效果。
3
+
4
+ 1. 根据retnet论文([https://arxiv.org/pdf/2307.08621.pdf](https://arxiv.org/pdf/2307.08621.pdf))开发并基于transformer文本生成模型。该仓库的算法实现根据repo进行([https://github.com/syncdoth/RetNet.git](https://github.com/syncdoth/RetNet.git))
5
+ 2. 该仓库目标是建立一个retnet基础训练仓库,建议做学习研究使用,不建议商用。
6
+ 3. 该仓库只使用wiki文本和少量sharegpt/belle/多轮指令数据集训练而成。包含中英文数据,数据估算占比7:3。
7
+ 4. 本次放出pretrain模型与sft微调后模型。
8
+ 5. 本模型使用了tokenizer为百川大模型的第一版分词器,共包含64000个vocab。
9
+ 6. 已知问题:
10
+ - 会出现重复句子回答,可以调节topk减轻该问题。
11
+ - 会出现回答不全问题,可以提高max_new_token缓解该问题。
12
+ - 由于知识储备不足,回答准确性一般。
13
+
14
 
15
  retnet-1.3B-toy is an open source model.
16
  1. Developed according to retnet paper ([https://arxiv.org/pdf/2307.08621.pdf](https://arxiv.org/pdf/2307.08621.pdf)) and based on transformer text generation model. The algorithmic implementation of this repository is carried out according to repo ([https://github.com/syncdoth/RetNet.git](https://github.com/syncdoth/RetNet.git))
17
  2. The goal of this repository is to suggest a retnet base training repository, which is recommended to be used for learning research and not for commercial use.
18
+ 3. This repository is trained using only wiki text and a small amount of sharegpt/belle instruction dataset.
19
+ 4. This release pretrain model with sft fine-tuned model.
20
+ 5. This model uses the tokenizer as the first version of the BaiChuan model tokenizer, which contains a total of 64,000 vocabs.
21
+ 6. known issues:
22
+ - Repeated sentence answers will occur, topk can be adjusted to mitigate the problem.
23
+ - Incomplete answers will occur, you can increase max_new_token to alleviate the problem.
24
+ - Answer accuracy is average due to insufficient knowledge base.
25
 
26
  ## 软件依赖 (Dependencies)
27
 
 
29
  pip install torch transformers
30
  ```
31
 
32
+ ## 模型&代码仓库(Model&Code Repo)
33
+ 1. 基础预训练模型(pretrain model)
34
+ ([https://huggingface.co/wac81/toy_retnet_1.3b_pretrain](https://huggingface.co/wac81/toy_retnet_1.3b_pretrain))
35
+ 2. sft微调后模型(sft model)
36
+ ([https://huggingface.co/wac81/toy_retnet_1.3b](https://huggingface.co/wac81/toy_retnet_1.3b))
37
+ 3. Code Repo
38
+ ([https://github.com/wac81/toy_retnet_1.3b](https://github.com/wac81/toy_retnet_1.3b))
39
+
40
+ ## 最小需求 (Minimum Requirements)
41
+
42
+ 模型可以完全加载在8GB显卡上,8bit/4bit量化后,理论上可以加载在4GB显卡上
43
+
44
+ The model can be fully loaded on an 8GB graphics card, and after 8bit or 4bit quantization, it can theoretically be loaded on a 4GB graphics card
45
+
46
  ## 代码调用 (Code Usage)
47
 
48
+ sft模型下载后放入checkpoints/checkpoint-21000目录,可以通过如下代码调用 retnet-1.3B-toy 模型来生成对话:
49
 
50
+ After the sft model is downloaded and put into the checkpoints/checkpoint-21000 directory, you can call the retnet-1.3B-toy model to generate a dialog with the following code:
51
 
52
  python generate.py
53