macadeliccc
/

magistrate-3.2-3b-base

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

macadeliccc commited on Sep 28, 2024

Commit

9f2ff65

·

verified ·

1 Parent(s): 1da0eed

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -189,7 +189,7 @@ special_tokens:
 </details><br>
-# outputs/lora-out
 This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on the None dataset.
 It achieves the following results on the evaluation set:
@@ -197,7 +197,7 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations
@@ -209,6 +209,8 @@ More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:

 </details><br>
+# magistrate-3.2-3b-base
 This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on the None dataset.
 It achieves the following results on the evaluation set:
 ## Model description
+This is a base model trained on US Supreme Court proceedings, US federal code and regulations. This is a proof of concept for a larger model as it can be very expensive to finetune something like a 70B.
 ## Intended uses & limitations
 ## Training procedure
+Spectrum top 35% fine tune. Methodology based on Cohere's paper: To Code, or Not To Code? Exploring Impact of Code in Pre-training
 ### Training hyperparameters
 The following hyperparameters were used during training: