takase commited on
Commit
76e4477
·
1 Parent(s): 3686599

update readme

Browse files
Files changed (1) hide show
  1. README.md +66 -3
README.md CHANGED
@@ -1,3 +1,66 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ja
4
+ - en
5
+ license: mit
6
+ ---
7
+
8
+ # Sarashina2.2-1B
9
+
10
+ This repository provides large language models trained by [SB Intuitions](https://www.sbintuitions.co.jp/).
11
+
12
+ ## How to use
13
+
14
+ ```python
15
+ import torch
16
+ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, set_seed
17
+
18
+ model = AutoModelForCausalLM.from_pretrained("sbintuitions/sarashina2.2-1b", torch_dtype=torch.bfloat16, device_map="auto")
19
+ tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2.2-1b")
20
+ generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
21
+ set_seed(123)
22
+
23
+ text = generator(
24
+ "おはようございます、今日の天気は",
25
+ max_length=30,
26
+ do_sample=True,
27
+ pad_token_id=tokenizer.pad_token_id,
28
+ num_return_sequences=3,
29
+ )
30
+
31
+ for t in text:
32
+ print(t)
33
+
34
+
35
+ ```
36
+
37
+ ## Model Description
38
+
39
+ We constructed the Sarashina2.2-1B model, which consists of about 1 billion parameters (excluding embeddings and the LM head when calculating the number of parameters), using a three-phase training process.
40
+ First, we trained the model on 10 trillion tokens, including Japanese, English, and code data extracted from web corpora.
41
+ Next, we trained the model using synthetic data to improve its performance on math and coding tasks.
42
+ Finally, we trained the model with a small amount of data to enhance its performance on various application tasks.
43
+
44
+ The following tables show the model's performance on Japanese tasks.
45
+ For reference, we also present the performance of our previous LLMs.
46
+ As shown in the table, our Sarashina2.2-3B outperforms Sarashina2-7B in Japanese QA tasks such as NIILC and JMMLU.
47
+ In addition, Sarashina2.2-3B outperforms Sarashina2-70B in Japanese math and coding tasks such as MGSM-ja and JHumanEval.
48
+
49
+ #### Evaluation in Japanese tasks
50
+
51
+ | Model | NIILC | JMMLU | MGSM-ja | JHumanEval |
52
+ |------------------|------------|------------|-----------|------------|
53
+ | [Sarashina2-7B](https://huggingface.co/sbintuitions/sarashina2-7b) | 62.2 | 42.5 | 7.2 | 12.8 |
54
+ | [Sarashina2-70B](https://huggingface.co/sbintuitions/sarashina2-70b) | **66.1** | **62.7** | 56.4 | 22.0 |
55
+ |**[Sarashina2.2-0.5B](https://huggingface.co/sbintuitions/sarashina2.2-0.5b)**| 34.6 | 28.8 | 21.2 | 15.2 |
56
+ |**[Sarashina2.2-1B](https://huggingface.co/sbintuitions/sarashina2.2-1b)**| 47.2 | 38.4 | 38.8 | 21.3 |
57
+ |**[Sarashina2.2-3B](https://huggingface.co/sbintuitions/sarashina2.2-3b)**| 62.2 | 52.7 | **63.6** | **39.6** |
58
+
59
+
60
+ ## Ethical Considerations and Limitations
61
+ This repository contains the pre-trained model, which has not yet been tuned to follow instructions.
62
+ Therefore, this model may generate meaningless sequences, inaccurate instances, or biased/objectionable outputs.
63
+ As post-trained Sarashina2.2 models, we have published [Sarashina2.2-0.5B-instruct-v0.1](https://huggingface.co/sbintuitions/sarashina2.2-0.5b-instruct-v0.1), [Sarashina2.2-1B-instruct-v0.1](https://huggingface.co/sbintuitions/sarashina2.2-1b-instruct-v0.1), and [Sarashina2.2-3B-instruct-v0.1](https://huggingface.co/sbintuitions/sarashina2.2-3b-instruct-v0.1).
64
+
65
+ ## License
66
+ [MIT License](https://huggingface.co/sbintuitions/sarashina2.2-1b/blob/main/LICENSE)