DeL-TaiseiOzaki commited on
Commit
309fba1
·
verified ·
1 Parent(s): ede4328

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -3
README.md CHANGED
@@ -1,3 +1,99 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - ja
5
+ base_model:
6
+ - llm-jp/llm-jp-3-3.7b
7
+ ---
8
+ # Tengentoppa-llm-jp-base-3.7B
9
+
10
+ This is a modified version of the [llm-jp-3-3.7b](https://huggingface.co/llm-jp/llm-jp-3-3.7b) model with additional special tokens for structured conversations. The base model was developed by the [Research and Development Center for Large Language Models](https://llmc.nii.ac.jp/) at the [National Institute of Informatics](https://www.nii.ac.jp/en/).
11
+
12
+ ![image/jpg](tengentoppa2.jpg)
13
+
14
+ ## Model Details
15
+
16
+ - **Base Model**: llm-jp-3-3.7b
17
+ - **Model Type**: Transformer-based Language Model
18
+ - **Parameters**: 3.7B
19
+ - **Context Length**: 4096
20
+ - **Language**: Japanese and English
21
+
22
+ ### Additional Special Tokens
23
+
24
+ This model includes the following special tokens for structured conversations:
25
+
26
+ ```
27
+ <|SYSTEM|>, </|SYSTEM|> - System message delimiters
28
+ <|USER|>, </|USER|> - User input delimiters
29
+ <|HINT|>, </|HINT|> - Hint message delimiters
30
+ <|REASONING|>, </|REASONING|> - Reasoning section delimiters
31
+ <|ASSISTANT|>, </|ASSISTANT|> - Assistant response delimiters
32
+ ```
33
+
34
+ ## Required Libraries and Their Versions
35
+
36
+ - torch>=2.3.0
37
+ - transformers>=4.40.1
38
+ - tokenizers>=0.19.1
39
+ - accelerate>=0.29.3
40
+ - flash-attn>=2.5.8
41
+
42
+ ## Usage
43
+
44
+ ```python
45
+ import torch
46
+ from transformers import AutoTokenizer, AutoModelForCausalLM
47
+
48
+ # Load the model and tokenizer
49
+ tokenizer = AutoTokenizer.from_pretrained("DeL-TaiseiOzaki/Tengentoppa-llm-jp-base-3.7B")
50
+ model = AutoModelForCausalLM.from_pretrained(
51
+ "DeL-TaiseiOzaki/Tengentoppa-llm-jp-base-3.7B",
52
+ device_map="auto",
53
+ torch_dtype=torch.bfloat16
54
+ )
55
+
56
+ # Example using special tokens
57
+ text = "<|SYSTEM|>You are a helpful assistant.</|SYSTEM|>\n<|USER|>自然言語処理とは何か</|USER|>"
58
+ tokenized_input = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt").to(model.device)
59
+
60
+ with torch.no_grad():
61
+ output = model.generate(
62
+ tokenized_input,
63
+ max_new_tokens=100,
64
+ do_sample=True,
65
+ top_p=0.95,
66
+ temperature=0.7,
67
+ repetition_penalty=1.05,
68
+ )[0]
69
+
70
+ print(tokenizer.decode(output))
71
+ ```
72
+
73
+ ## Base Model Information
74
+
75
+ ### Model Architecture
76
+ |Params|Layers|Hidden size|Heads|Context length|Embedding parameters|Non-embedding parameters|
77
+ |:---:|:---:|:---:|:---:|:---:|:---:|:---:|
78
+ |3.7b|28|3072|24|4096|611,844,096|3,171,068,928|
79
+
80
+ ### Tokenizer
81
+
82
+ The tokenizer is based on the original llm-jp-3-3.7b tokenizer, which uses [huggingface/tokenizers](https://github.com/huggingface/tokenizers) Unigram byte-fallback model. The vocabulary is based on [`llm-jp-tokenizer v3.0`](https://github.com/llm-jp/llm-jp-tokenizer/releases/tag/v3.0b2), with our additional special tokens added to the vocabulary.
83
+
84
+ ## License
85
+
86
+ This model inherits the license from the base model:
87
+ [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
88
+
89
+ ## Attribution
90
+
91
+ This model is based on llm-jp-3-3.7b. Please cite the original model and its creators when using this modified version.
92
+
93
+ ## Modifications
94
+
95
+ The only modifications made to the original model are:
96
+ 1. Addition of special tokens for structured conversations
97
+ 2. Resizing of token embeddings to accommodate the new special tokens
98
+
99
+ All other aspects of the model, including its training data, architecture, and capabilities, remain the same as the original llm-jp-3-3.7b model.