Joseph717171
/

Mistral-12.25B-Instruct-v0.2

Text Generation

text-generation-inference

Model card Files Files and versions Community

Joseph717171 commited on Mar 31, 2024

Commit

33ff0cf

·

verified ·

1 Parent(s): d9adedb

Update README.md

Files changed (1) hide show

README.md +10 -2

README.md CHANGED Viewed

@@ -4,10 +4,15 @@ library_name: transformers
 tags:
 - mergekit
 - merge
 ---
 # Mistral-12.25B-Instruct-v0.2
 This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 ## Merge Details
@@ -27,6 +32,9 @@ The following YAML configuration was used to produce this model:
 ```yaml
 dtype: bfloat16
 merge_method: passthrough
 slices:
 - sources:
   - layer_range: [0, 28]
@@ -35,4 +43,4 @@ slices:
   - layer_range: [4, 32]
     model: /Users/jsarnecki/opt/Workspace/mistralai/Mistral-7B-Instruct-v0.2
-```

 tags:
 - mergekit
 - merge
+license: apache-2.0
 ---
+# Credit for the model card's description goes to ddh0 and mergekit
 # Mistral-12.25B-Instruct-v0.2
+This is # Mistral-12.25B-Instruct-v0.2, a depth-upscaled version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2).
+This model is intended to be used as a basis for further fine-tuning, or as a drop-in upgrade from the original 7 billion parameter model.
 This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 ## Merge Details
 ```yaml
 dtype: bfloat16
 merge_method: passthrough
+# Depth UpScaled (DUS) version of Mistral-7B-v0.2
+# where m = 4 (The number of layers to remove from the model)
+#       s = 56 (The number of layers the model will have after the DUS)
 slices:
 - sources:
   - layer_range: [0, 28]
   - layer_range: [4, 32]
     model: /Users/jsarnecki/opt/Workspace/mistralai/Mistral-7B-Instruct-v0.2
+```