RichardErkhov commited on
Commit
e0f30ff
1 Parent(s): 4502c53

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ japanese-gpt2-xsmall - bnb 4bits
11
+ - Model creator: https://huggingface.co/rinna/
12
+ - Original model: https://huggingface.co/rinna/japanese-gpt2-xsmall/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ language: ja
20
+ thumbnail: https://github.com/rinnakk/japanese-gpt2/blob/master/rinna.png
21
+ tags:
22
+ - gpt2
23
+ - text-generation
24
+ - lm
25
+ - nlp
26
+ license: mit
27
+ datasets:
28
+ - cc100
29
+ - wikipedia
30
+ widget:
31
+ - text: "生命、宇宙、そして万物についての究極の疑問の答えは"
32
+ ---
33
+
34
+ # japanese-gpt2-xsmall
35
+
36
+ ![rinna-icon](./rinna.png)
37
+
38
+ This repository provides an extra-small-sized Japanese GPT-2 model. The model was trained using code from Github repository [rinnakk/japanese-pretrained-models](https://github.com/rinnakk/japanese-pretrained-models) by [rinna Co., Ltd.](https://corp.rinna.co.jp/)
39
+
40
+ # How to use the model
41
+
42
+ ~~~~
43
+ from transformers import AutoTokenizer, AutoModelForCausalLM
44
+
45
+ tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt2-xsmall", use_fast=False)
46
+ tokenizer.do_lower_case = True # due to some bug of tokenizer config loading
47
+
48
+ model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt2-xsmall")
49
+ ~~~~
50
+
51
+ # Model architecture
52
+ A 6-layer, 512-hidden-size transformer-based language model.
53
+
54
+ # Training
55
+ The model was trained on [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz) and [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch) to optimize a traditional language modelling objective on 8\\*V100 GPUs for around 4 days. It reaches around 28 perplexity on a chosen validation set from CC-100.
56
+
57
+ # Tokenization
58
+ The model uses a [sentencepiece](https://github.com/google/sentencepiece)-based tokenizer, the vocabulary was trained on the Japanese Wikipedia using the official sentencepiece training script.
59
+
60
+ # How to cite
61
+ ```bibtex
62
+ @misc{rinna-japanese-gpt2-xsmall,
63
+ title = {rinna/japanese-gpt2-xsmall},
64
+ author = {Zhao, Tianyu and Sawada, Kei},
65
+ url = {https://huggingface.co/rinna/japanese-gpt2-xsmall}
66
+ }
67
+
68
+ @inproceedings{sawada2024release,
69
+ title = {Release of Pre-Trained Models for the {J}apanese Language},
70
+ author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
71
+ booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
72
+ month = {5},
73
+ year = {2024},
74
+ pages = {13898--13905},
75
+ url = {https://aclanthology.org/2024.lrec-main.1213},
76
+ note = {\url{https://arxiv.org/abs/2404.01657}}
77
+ }
78
+ ```
79
+
80
+ # Licenese
81
+ [The MIT license](https://opensource.org/licenses/MIT)
82
+
83
+