kotoba-tech
/

kotomamba-2.8B-CL-v1.0

+---
+language:
+- en
+- ja
+library_name: transformers
+pipeline_tag: text-generation
+license: apache-2.0
+model_type: mamba
+---
+# Kotomamba
+The kotomamba model represents a cutting-edge approach in natural language processing (NLP), leveraging the innovative State Space Model mamba architecture.
+The kotomamba model comes in two distinct versions.
+1. Bilingual Pre-training (Japanese and English):
+   The first variant of the kotomamba model is pre-trained on a rich dataset(About 200B Token) comprising both Japanese and English texts.
+2. Continual Pre-training (Mainly Japanese):
+   The second variant of the kotomamba model takes a different approach, focusing exclusively on Japanese-centric data for its continual pre-training phase.
+## Kotomamba Model Index
+|Model|kotomamba-hf|
+|---|---|
+|kotomamba-2.8B-v1.0| [Link](https://huggingface.co/kotoba-tech/kotomamba-2.8B-v1.0) |
+|kotomamba-2.8B-CL=v1.0| [Link](https://huggingface.co/kotoba-tech/kotomamba-2.8B-CL-v1.0) |
+![logo](./logo.webp)
+This repository provides large language models developed by [Kotoba Technologies](https://www.kotoba.tech/), Tohoku University [TohokuNLP group](https://www.nlp.ecei.tohoku.ac.jp/), and Tokyo Institute of Technology [Okazaki Lab](https://www.nlp.c.titech.ac.jp/index.en.html), [Yokota Lab](https://www.rio.gsic.titech.ac.jp/en/index.html).
+Read our [blog post](https://zenn.dev/kotoba_tech/articles/f15b2495d44c4f) or our technical paper (preprint coming soon) for more details!
+## Model Details
+* **Model type**: Please refer to [mamba technical paper](https://arxiv.org/abs/2312.00752) for details on the model architecture.
+* **Language(s)**: Japanese English
+* **Library**: [kotomamba](https://github.com/kotoba-tech/kotomamba)
+* **Tokenizer**: kotomamba-2.8B uses [llm-jp-tokenizer 100K](https://github.com/llm-jp/llm-jp-tokenizer) and kotomamba-2.8B-CL uses [GPT-NeoX Tokenizer](https://huggingface.co/EleutherAI/gpt-neox-20b).
+* **Contact**:
+## Base Model Performance
+### Japanese version
+|Model|Size|JCommonsenseQA|JEMHopQA|NIILC|JSQuAD|
+|---|---|---|---|---|---|
+|   |   |4-shot|4-shot|4-shot|4-shot|
+| [state-spaces/mamba-2.8b-slimpj](https://huggingface.co/state-spaces/mamba-2.8b-slimpj)  |  2.8B |0.1796|0.2825|0.0998|0.3301|
+| kotomamba-2.8B  |  2.8B |0.185|0.4532|0.3871|0.4685|
+| kotomamba-2.8B-CL | 2.8B  |0.185|0.3758|0.2393|0.5929|
+## Usage
+First, install additional dependencies in [requirements.txt](./requirements.txt):
+```sh
+pip install -r requirements.txt
+```
+### Use the base model
+`git clone https://github.com/kotoba-tech/kotomamba` and follow the README installation section.
+**WARNING**: huggingface transformers `AutoModelForCausalLM` **doesn't support** mamba model. So, please use `kotomamba/benchmarks/benchmark_generation_mamba_simple.py`
+You can find the inference sample script in `scripts/abci/inference/inference_sample.sh`
+## Training Datasets
+### Pre-Training & Continual Pre-Training
+The following datasets were used for training.
+- [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
+- Swallow Corpus
+- [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B)
+## Risks and Limitations
+The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.
+## Acknowledgements
+We thank Albert Gu and Tri Dao for releasing the original mamba model and implementation on GitHub.
+Our project is supported by the [ABCI Grand Challenge](https://abci.ai/en/link/grandchallenge.html) of the National Institute of Advanced Industrial Science and Technology.
+## License
+Apache License Version 2.0, January 2004
+## Authors
+Here are the team members:
+- From [Kotoba Technologies](https://www.kotoba.tech/)
+  - [Noriyuki Kojima](https://twitter.com/noriyuki_kojima)
+  - [Jungo Kasai](https://twitter.com/jungokasai)
+  - [Hiroto Kurita](https://twitter.com/hiroto_kurita)
+  - [Kazuki Fujii](https://twitter.com/okoge_kaz)
+- From [TohokuNLP group at Tohoku University](https://www.nlp.ecei.tohoku.ac.jp/)
+  - [Keisuke Sakaguchi](https://twitter.com/KeisukeS_)
+- From Tokyo Institute of Technologies
+  - From [Okazaki Laboratory](https://www.nlp.c.titech.ac.jp/index.en.html), the following members:
+    - [Naoaki Okazaki](https://www.chokkan.org/index.ja.html)
+    - [Sakae Mizuki](https://s-mizuki-nlp.github.io/)
+    - [Hiroki Iida](https://meshidenn.github.io/)
+    - [Mengsay Loem](https://loem-ms.github.io/)
+    - [Shota Hirai](https://huggingface.co/Kotemo428)
+    - [Kakeru Hattori](https://aya-se.vercel.app/)
+    - [Masanari Ohi](https://twitter.com/stjohn2007)
+  - From [YOKOTA Laboratory](https://www.rio.gsic.titech.ac.jp/en/index.html), the following members:
+    - [Rio Yokota](https://twitter.com/rioyokota)
+    - [Taishi Nakamura](https://twitter.com/Setuna7777_2)