Joseph717171
/

Mistral-12.25B-Instruct-v0.2

Text Generation

text-generation-inference

Model card Files Files and versions

Joseph717171 commited on Mar 31, 2024

Commit

2fe5bb2

·

verified ·

1 Parent(s): f167854

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -13,6 +13,9 @@ license: apache-2.0
 This is Mistral-12.25B-Instruct-v0.2, a depth-upscaled version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2).
 This model is intended to be used as a basis for further fine-tuning, or as a drop-in upgrade from the original 7 billion parameter model.
 This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 # UpStage's conclusionary limitations of their research:

 This is Mistral-12.25B-Instruct-v0.2, a depth-upscaled version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2).
 This model is intended to be used as a basis for further fine-tuning, or as a drop-in upgrade from the original 7 billion parameter model.
+Paper detailing how Depth-Up Scaling works:  [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166)
 This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 # UpStage's conclusionary limitations of their research: