Ontocord.AI commited on
Commit
4a51bc1
·
1 Parent(s): 92d5e4d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -16
README.md CHANGED
@@ -12,6 +12,16 @@ This is a merge of the following MPT-7B models:
12
  - **e**mozilla/mpt-7b-storysummarizer
13
  - **n**omic-ai/gpt4all-mpt
14
 
 
 
 
 
 
 
 
 
 
 
15
  # Test eval on only 10% of eval set
16
 
17
  hf-causal (pretrained=Multi-Domain-Expert-Layers/given-mpt-7b,dtype=bfloat16,trust_remote_code=True), limit: 0.1, provide_description: False, num_fewshot: 0, batch_size: None
@@ -142,19 +152,3 @@ hf-causal (pretrained=Multi-Domain-Expert-Layers/given-mpt-7b,dtype=bfloat16,tru
142
  | | |rougeL_diff|-8.5753|± |2.8259|
143
 
144
 
145
- ## Model License
146
-
147
- Apache 2.0
148
-
149
- # Original Model Card From MPT-7B-StoryWriter-65k+
150
-
151
- MPT-7B-StoryWriter-65k+ is a model designed to read and write fictional stories with super long context lengths.
152
- It was built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the [books3 dataset](https://huggingface.co/datasets/the_pile_books3).
153
- At inference time, thanks to [ALiBi](https://arxiv.org/abs/2108.12409), MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens.
154
- We demonstrate generations as long as 84k tokens on a single node of 8 A100-80GB GPUs in our [blogpost](https://www.mosaicml.com/blog/mpt-7b).
155
- * License: Apache 2.0
156
-
157
- This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.
158
-
159
-
160
-
 
12
  - **e**mozilla/mpt-7b-storysummarizer
13
  - **n**omic-ai/gpt4all-mpt
14
 
15
+
16
+ ## Model License
17
+
18
+ Apache 2.0
19
+
20
+
21
+ ## Purpose
22
+
23
+ This model is for experting with merging and routing to expert layers.
24
+
25
  # Test eval on only 10% of eval set
26
 
27
  hf-causal (pretrained=Multi-Domain-Expert-Layers/given-mpt-7b,dtype=bfloat16,trust_remote_code=True), limit: 0.1, provide_description: False, num_fewshot: 0, batch_size: None
 
152
  | | |rougeL_diff|-8.5753|± |2.8259|
153
 
154