rinna
/

japanese-gpt2-medium

Text Generation

text-generation-inference

Model card Files Files and versions Community

tianyuz commited on May 17, 2021

Commit

1fae383

·

1 Parent(s): 8e2be84

Update README.md

Files changed (1) hide show

README.md +16 -3

README.md CHANGED Viewed

@@ -17,9 +17,9 @@ datasets:
 ![rinna-icon](./rinna.png)
-This repository provides a medium-sized Japanese GPT-2 model trained on [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz). The model is provided by [rinna](https://corp.rinna.co.jp/).
-# Use the model
 *NOTE:* Use `T5Tokenizer` to initiate the tokenizer.
@@ -27,6 +27,19 @@ This repository provides a medium-sized Japanese GPT-2 model trained on [Japanes
 from transformers import T5Tokenizer, AutoModelForCausalLM
 tokenizer = T5Tokenizer.from_pretrained("rinna/japanese-gpt2-medium")
 model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt2-medium")
-~~~~

 ![rinna-icon](./rinna.png)
+This repository provides a medium-sized Japanese GPT-2 model. The model is provided by [rinna](https://corp.rinna.co.jp/).
+# How to use the model
 *NOTE:* Use `T5Tokenizer` to initiate the tokenizer.
 from transformers import T5Tokenizer, AutoModelForCausalLM
 tokenizer = T5Tokenizer.from_pretrained("rinna/japanese-gpt2-medium")
+tokenizer.do_lower_case = True  # due to some bug of tokenizer config loading
 model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt2-medium")
+~~~~
+# Model architecture
+A 24-layer, 1024-hidden-size transformer-based language model.
+# Training
+The model was trained on [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz) to optimize a traditional language modelling objective on 8\*V100 GPUs for around 30 days. It reaches around 18 perplexity on a chosen validation set from the same data.
+# Tokenization
+The model uses a [sentencepiece](https://github.com/google/sentencepiece)-based tokenizer, the vocabulary is also directly adopted from the pre-trained tokenizer in the [link](https://github.com/google/sentencepiece).
+# Licenese
+[The MIT license](https://opensource.org/licenses/MIT)