RichardErkhov
/

rinna_-_japanese-gpt2-xsmall-8bits

8-bit precision

Model card Files Files and versions Community

RichardErkhov commited on Oct 11, 2024

Commit

8faf523

•

1 Parent(s): ca8cd00

uploaded readme

Files changed (1) hide show

README.md +83 -0

README.md ADDED Viewed

	@@ -0,0 +1,83 @@

+Quantization made by Richard Erkhov.
+[Github](https://github.com/RichardErkhov)
+[Discord](https://discord.gg/pvy7H8DZMG)
+[Request more models](https://github.com/RichardErkhov/quant_request)
+japanese-gpt2-xsmall - bnb 8bits
+- Model creator: https://huggingface.co/rinna/
+- Original model: https://huggingface.co/rinna/japanese-gpt2-xsmall/
+Original model description:
+---
+language: ja
+thumbnail: https://github.com/rinnakk/japanese-gpt2/blob/master/rinna.png
+tags:
+- gpt2
+- text-generation
+- lm
+- nlp
+license: mit
+datasets:
+- cc100
+- wikipedia
+widget:
+- text: "生命、宇宙、そして万物についての究極の疑問の答えは"
+---
+# japanese-gpt2-xsmall
+![rinna-icon](./rinna.png)
+This repository provides an extra-small-sized Japanese GPT-2 model. The model was trained using code from Github repository [rinnakk/japanese-pretrained-models](https://github.com/rinnakk/japanese-pretrained-models) by [rinna Co., Ltd.](https://corp.rinna.co.jp/)
+# How to use the model
+~~~~
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt2-xsmall", use_fast=False)
+tokenizer.do_lower_case = True  # due to some bug of tokenizer config loading
+model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt2-xsmall")
+~~~~
+# Model architecture
+A 6-layer, 512-hidden-size transformer-based language model.
+# Training
+The model was trained on [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz) and [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch) to optimize a traditional language modelling objective on 8\\*V100 GPUs for around 4 days. It reaches around 28 perplexity on a chosen validation set from CC-100.
+# Tokenization
+The model uses a [sentencepiece](https://github.com/google/sentencepiece)-based tokenizer, the vocabulary was trained on the Japanese Wikipedia using the official sentencepiece training script.
+# How to cite
+```bibtex
+@misc{rinna-japanese-gpt2-xsmall,
+    title = {rinna/japanese-gpt2-xsmall},
+    author = {Zhao, Tianyu and Sawada, Kei},
+    url = {https://huggingface.co/rinna/japanese-gpt2-xsmall}
+}
+@inproceedings{sawada2024release,
+    title = {Release of Pre-Trained Models for the {J}apanese Language},
+    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
+    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
+    month = {5},
+    year = {2024},
+    pages = {13898--13905},
+    url = {https://aclanthology.org/2024.lrec-main.1213},
+    note = {\url{https://arxiv.org/abs/2404.01657}}
+}
+```
+# Licenese
+[The MIT license](https://opensource.org/licenses/MIT)