Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -1,23 +1,27 @@
|
|
1 |
## 介绍 (Introduction)
|
2 |
-
retnet-1.3B-toy
|
3 |
-
|
4 |
-
|
5 |
-
2
|
6 |
-
3
|
7 |
-
4
|
8 |
-
|
9 |
-
|
10 |
-
|
|
|
|
|
|
|
11 |
|
12 |
retnet-1.3B-toy is an open source model.
|
13 |
1. Developed according to retnet paper ([https://arxiv.org/pdf/2307.08621.pdf](https://arxiv.org/pdf/2307.08621.pdf)) and based on transformer text generation model. The algorithmic implementation of this repository is carried out according to repo ([https://github.com/syncdoth/RetNet.git](https://github.com/syncdoth/RetNet.git))
|
14 |
2. The goal of this repository is to suggest a retnet base training repository, which is recommended to be used for learning research and not for commercial use.
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
-
|
20 |
-
-
|
|
|
21 |
|
22 |
## 软件依赖 (Dependencies)
|
23 |
|
@@ -25,11 +29,25 @@ retnet-1.3B-toy is an open source model.
|
|
25 |
pip install torch transformers
|
26 |
```
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
## 代码调用 (Code Usage)
|
29 |
|
30 |
-
|
31 |
|
32 |
-
|
33 |
|
34 |
python generate.py
|
35 |
|
|
|
1 |
## 介绍 (Introduction)
|
2 |
+
retnet-1.3B-toy 是一个开源模型。主要是为探索模型小型化,测试小数据量训练的最佳效果。
|
3 |
+
|
4 |
+
1. 根据retnet论文([https://arxiv.org/pdf/2307.08621.pdf](https://arxiv.org/pdf/2307.08621.pdf))开发并基于transformer文本生成模型。该仓库的算法实现根据repo进行([https://github.com/syncdoth/RetNet.git](https://github.com/syncdoth/RetNet.git))
|
5 |
+
2. 该仓库目标是建立一个retnet基础训练仓库,建议做学习研究使用,不建议商用。
|
6 |
+
3. 该仓库只使用wiki文本和少量sharegpt/belle/多轮指令数据集训练而成。包含中英文数据,数据估算占比7:3。
|
7 |
+
4. 本次放出pretrain模型与sft微调后模型。
|
8 |
+
5. 本模型使用了tokenizer为百川大模型的第一版分词器,共包含64000个vocab。
|
9 |
+
6. 已知问题:
|
10 |
+
- 会出现重复句子回答,可以调节topk减轻该问题。
|
11 |
+
- 会出现回答不全问题,可以提高max_new_token缓解该问题。
|
12 |
+
- 由于知识储备不足,回答准确性一般。
|
13 |
+
|
14 |
|
15 |
retnet-1.3B-toy is an open source model.
|
16 |
1. Developed according to retnet paper ([https://arxiv.org/pdf/2307.08621.pdf](https://arxiv.org/pdf/2307.08621.pdf)) and based on transformer text generation model. The algorithmic implementation of this repository is carried out according to repo ([https://github.com/syncdoth/RetNet.git](https://github.com/syncdoth/RetNet.git))
|
17 |
2. The goal of this repository is to suggest a retnet base training repository, which is recommended to be used for learning research and not for commercial use.
|
18 |
+
3. This repository is trained using only wiki text and a small amount of sharegpt/belle instruction dataset.
|
19 |
+
4. This release pretrain model with sft fine-tuned model.
|
20 |
+
5. This model uses the tokenizer as the first version of the BaiChuan model tokenizer, which contains a total of 64,000 vocabs.
|
21 |
+
6. known issues:
|
22 |
+
- Repeated sentence answers will occur, topk can be adjusted to mitigate the problem.
|
23 |
+
- Incomplete answers will occur, you can increase max_new_token to alleviate the problem.
|
24 |
+
- Answer accuracy is average due to insufficient knowledge base.
|
25 |
|
26 |
## 软件依赖 (Dependencies)
|
27 |
|
|
|
29 |
pip install torch transformers
|
30 |
```
|
31 |
|
32 |
+
## 模型&代码仓库(Model&Code Repo)
|
33 |
+
1. 基础预训练模型(pretrain model)
|
34 |
+
([https://huggingface.co/wac81/toy_retnet_1.3b_pretrain](https://huggingface.co/wac81/toy_retnet_1.3b_pretrain))
|
35 |
+
2. sft微调后模型(sft model)
|
36 |
+
([https://huggingface.co/wac81/toy_retnet_1.3b](https://huggingface.co/wac81/toy_retnet_1.3b))
|
37 |
+
3. Code Repo
|
38 |
+
([https://github.com/wac81/toy_retnet_1.3b](https://github.com/wac81/toy_retnet_1.3b))
|
39 |
+
|
40 |
+
## 最小需求 (Minimum Requirements)
|
41 |
+
|
42 |
+
模型可以完全加载在8GB显卡上,8bit/4bit量化后,理论上可以加载在4GB显卡上
|
43 |
+
|
44 |
+
The model can be fully loaded on an 8GB graphics card, and after 8bit or 4bit quantization, it can theoretically be loaded on a 4GB graphics card
|
45 |
+
|
46 |
## 代码调用 (Code Usage)
|
47 |
|
48 |
+
sft模型下载后放入checkpoints/checkpoint-21000目录,可以通过如下代码调用 retnet-1.3B-toy 模型来生成对话:
|
49 |
|
50 |
+
After the sft model is downloaded and put into the checkpoints/checkpoint-21000 directory, you can call the retnet-1.3B-toy model to generate a dialog with the following code:
|
51 |
|
52 |
python generate.py
|
53 |
|