quincyqiang
commited on
Commit
·
d663db7
1
Parent(s):
734f349
Update README.md
Browse files
README.md
CHANGED
@@ -56,7 +56,7 @@ tags:
|
|
56 |
└── step4_merge_tokenizers.py 与原版llama的分词器进行合并,得到hf格式的tokenizer
|
57 |
|
58 |
```
|
59 |
-
|
60 |
|
61 |
|
62 |
|
|
|
56 |
└── step4_merge_tokenizers.py 与原版llama的分词器进行合并,得到hf格式的tokenizer
|
57 |
|
58 |
```
|
59 |
+
原始llama2词表大小**32000**,与40k训练的中文分词模型合并之后词表大小为**68419**,sft添加pad字符之后大小为**68420**
|
60 |
|
61 |
|
62 |
|