macadeliccc
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -189,7 +189,7 @@ special_tokens:
|
|
189 |
|
190 |
</details><br>
|
191 |
|
192 |
-
#
|
193 |
|
194 |
This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on the None dataset.
|
195 |
It achieves the following results on the evaluation set:
|
@@ -197,7 +197,7 @@ It achieves the following results on the evaluation set:
|
|
197 |
|
198 |
## Model description
|
199 |
|
200 |
-
|
201 |
|
202 |
## Intended uses & limitations
|
203 |
|
@@ -209,6 +209,8 @@ More information needed
|
|
209 |
|
210 |
## Training procedure
|
211 |
|
|
|
|
|
212 |
### Training hyperparameters
|
213 |
|
214 |
The following hyperparameters were used during training:
|
|
|
189 |
|
190 |
</details><br>
|
191 |
|
192 |
+
# magistrate-3.2-3b-base
|
193 |
|
194 |
This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on the None dataset.
|
195 |
It achieves the following results on the evaluation set:
|
|
|
197 |
|
198 |
## Model description
|
199 |
|
200 |
+
This is a base model trained on US Supreme Court proceedings, US federal code and regulations. This is a proof of concept for a larger model as it can be very expensive to finetune something like a 70B.
|
201 |
|
202 |
## Intended uses & limitations
|
203 |
|
|
|
209 |
|
210 |
## Training procedure
|
211 |
|
212 |
+
Spectrum top 35% fine tune. Methodology based on Cohere's paper: To Code, or Not To Code? Exploring Impact of Code in Pre-training
|
213 |
+
|
214 |
### Training hyperparameters
|
215 |
|
216 |
The following hyperparameters were used during training:
|