--- license: apache-2.0 datasets: - sudy-super/JetCopper-10B language: - ja - en tags: - japanese - causal-lm inference: false --- *Logo designed by [Rotejin](https://x.com/rotejin).* # Contrail-200m-64k ## Description Contrail is Mistral model pre-trained on the 10b tokens of [JetCopper-10B](https://huggingface.co/datasets/sudy-super/JetCopper-10B). A final validation perplexity of 27.88 has been reached. ## Model Details - **Architecture**: Mistral (LLaMA-compatible) - **Model size**: 200M - **Trained tokens**: 10B tokens - **Context length**: 65536 - **Languages**: Japanese, English - **License**: Apache-2.0 ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer import torch model_name = "sudy-super/Contrail-200m-64k" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16) streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) if torch.cuda.is_available(): model = model.to("cuda") prompt = "AIによって私達の暮らしは、" with torch.no_grad(): token_ids = tokenizer.encode(prompt, return_tensors="pt") output_ids = model.generate( input_ids=token_ids.to(model.device), min_new_tokens=10, max_new_tokens=100, do_sample=True, temperature=0.7, streamer=streamer, ) ``` ## Author [Rakuto Suda](https://huggingface.co/sudy-super) ## Citations ``` @article{jiang2023mistral}, title={Mistral 7B}, author={Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, L{\'e}lio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timoth{\'e}e Lacroix, William El Sayed}, journal={arXiv preprint arXiv:2310.06825}, year={2023} } ```