Commit
•
14cd9f3
1
Parent(s):
cefd69e
Update README.md (#6)
Browse files- Update README.md (e3e9ebb1f55d4aff09d8dda58a9428ac5c990efa)
Co-authored-by: Chaitanya Singhal <[email protected]>
README.md
CHANGED
@@ -11,9 +11,12 @@ pipeline_tag: text-generation
|
|
11 |
|
12 |
Buddhi-128k-Chat is a general-purpose first chat model with 128K context length window. It is meticulously fine-tuned on the Mistral 7B Instruct, and optimised to handle an extended context length of up to 128,000 tokens using the innovative YaRN (Yet another Rope Extension) Technique. This enhancement allows Buddhi to maintain a deeper understanding of context in long documents or conversations, making it particularly adept at tasks requiring extensive context retention, such as comprehensive document summarization, detailed narrative generation, and intricate question-answering.
|
13 |
|
14 |
-
## Dataset Creation
|
15 |
-
|
16 |
## Architecture
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
### Hardware requirements:
|
19 |
> For 128k Context Length
|
@@ -135,6 +138,19 @@ In order to leverage instruction fine-tuning, your prompt should be surrounded b
|
|
135 |
|
136 |
```
|
137 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
138 |
## Get in Touch
|
139 |
|
140 |
You can schedule a 1:1 meeting with our DevRel & Community Team to get started with AI Planet Open Source LLMs and GenAI Stack. Schedule the call here: [https://calendly.com/jaintarun](https://calendly.com/jaintarun)
|
|
|
11 |
|
12 |
Buddhi-128k-Chat is a general-purpose first chat model with 128K context length window. It is meticulously fine-tuned on the Mistral 7B Instruct, and optimised to handle an extended context length of up to 128,000 tokens using the innovative YaRN (Yet another Rope Extension) Technique. This enhancement allows Buddhi to maintain a deeper understanding of context in long documents or conversations, making it particularly adept at tasks requiring extensive context retention, such as comprehensive document summarization, detailed narrative generation, and intricate question-answering.
|
13 |
|
|
|
|
|
14 |
## Architecture
|
15 |
+
The Buddhi-128K-Chat model is fine-tuned on the Mistral-7B Instruct base model. We selected the Mistral 7B Instruct v0.2 as the parent model due to its superior reasoning capabilities. The architecture of the Mistral-7B model includes features like Grouped-Query Attention and Byte-fallback BPE tokenizer. Originally, this model has 32,768 maximum position embeddings. To increase the context size to 128K, we needed to modify the positional embeddings, which is where YaRN comes into play.
|
16 |
+
|
17 |
+
In our approach, we utilized the NTK-aware technique, which recommends alternative interpolation techniques for positional interpolation. One experimentation involved Dynamic-YARN, suggesting the dynamic value of the 's' scale factor. This is because during inference, the sequence length changes by 1 after every word prediction. By integrating these position embeddings with the Mistral-7B Instruct base model, we achieved the 128K model.
|
18 |
+
|
19 |
+
Additionally, we fine-tuned the model on our dataset to contribute one of the very few 128K chat-based models available in the open-source community with greater reasoning capabilities than all of it.
|
20 |
|
21 |
### Hardware requirements:
|
22 |
> For 128k Context Length
|
|
|
138 |
|
139 |
```
|
140 |
|
141 |
+
## Benchmarks
|
142 |
+
|
143 |
+
</div>
|
144 |
+
| Model | # Params | Average | ARC (25-shot) | HellaSwag (10-shot) | Winogrande (5-shot) | TruthfulOA (0-shot) | MMLU (5-shot) |
|
145 |
+
|-----------------------------------|----------|---------|---------------|---------------------|---------------------|---------------------|---------------|
|
146 |
+
| aiplanet/buddhi-128k-chat-7b | 7B | 64.42 | 60.84 | 84 | 77.27 | 65.72 | 60.42 |
|
147 |
+
| migtissera/Tess-XS-vl-3-yarn-128K | 7B | 62.66 | 61.09 | 82.95 | 74.43 | 50.13 | 62.15 |
|
148 |
+
| migtissera/Tess-XS-v1-3-yarn-128K | 7B | 62.49 | 61.6 | 82.96 | 74.74 | 50.2 | 62.1 |
|
149 |
+
| Eric111/Yarn-Mistral-7b-128k-DPO | 7B | 60.15 | 60.84 | 82.99 | 78.3 | 43.55 | 63.09 |
|
150 |
+
| NousResearch/Yam-Mistral-7b-128k | 7B | 59.42 | 59.64 | 82.5 | 76.95 | 41.78 | 63.02 |
|
151 |
+
| CallComply/openchat-3.5-0106-128k | 7B | 59.38 | 64.25 | 77.31 | 77.66 | 46.5 | 57.58 |
|
152 |
+
| CallComply/zephyr-7b-beta-128k | 7B | 54.45 | 58.28 | 81 | 74.74 | 46.1 | 53.57 |
|
153 |
+
</div>
|
154 |
## Get in Touch
|
155 |
|
156 |
You can schedule a 1:1 meeting with our DevRel & Community Team to get started with AI Planet Open Source LLMs and GenAI Stack. Schedule the call here: [https://calendly.com/jaintarun](https://calendly.com/jaintarun)
|