stardust-coder commited on
Commit
dc53f8a
·
verified ·
1 Parent(s): 027edf7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md CHANGED
@@ -1,3 +1,43 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ # TinyLlama + Japanese
6
+
7
+ A continual pretraining model of TinyLlama 1.1B with a few Japanese texts.
8
+
9
+
10
+ ### Base Model
11
+
12
+ [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T)
13
+
14
+ ### Tokenizers
15
+
16
+ (elyza/ELYZA-japanese-Llama-2-7b)[https://huggingface.co/elyza/ELYZA-japanese-Llama-2-7b]
17
+
18
+
19
+ ### Training Dataset
20
+
21
+ Around 9B tokens in total.
22
+
23
+ - izumi-lab/wikipedia-ja-20230720
24
+ - if001/oscar_2023_filtered
25
+
26
+ ### Validation Dataset
27
+
28
+ - izumi-lab/wikinews-ja-20230728
29
+ - izumi-lab/wikinews-en-20230728
30
+ - if001/aozorabunko-clean-sin
31
+
32
+
33
+ ### Evaluation
34
+
35
+ We did not perform.
36
+
37
+
38
+ ### Acknowledgement
39
+
40
+ We acknowledge those who prepared valuable datasets and [lit-gpt](https://github.com/Lightning-AI/lit-gpt).
41
+
42
+
43
+