UsernameJustAnother
/

Nemo-12B-Marlin-v8

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

UsernameJustAnother commited on Aug 23

Commit

df1c04b

•

1 Parent(s): cf3b4f8

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -29,7 +29,7 @@ datasets:
 - **License:** apache-2.0
 - **Finetuned from model :** unsloth/Mistral-Nemo-Base-2407
-**Standard disclaimer:** This is me teaching myself the basics of fine-tuning, with notes extensively borrowed from https://huggingface.co/nothingiisreal/MN-12B-Celeste-V1.9. Huge props to [nothingisreal](https://huggingface.co/nothingiisreal) for posting their process and making me think this was even possible for a little fish like me.
 The aim here is for a solid RP/storywriting model that will fit in 16GB of VRAM with a decent amount of context (> 16K).
@@ -40,7 +40,7 @@ The aim here is for a solid RP/storywriting model that will fit in 16GB of VRAM
   - 2K of Claude instruct, lightly curated & de-clauded
   - 2K of curated Falling through the Skies
   - 2K of curated/lightly de-ministrated C2 chat
-- Trained on a single 80GB A100 from runpod.io, with batch size of 8 (up from 2 on A100 40G), so far less steps involved.
 - And remember kids, water is wet and fish are moist.
 I pulled v7 because I honestly don't think it's as good as v6, and don't want folks to get the wrong idea that it's better just because the version number is higher. Besides, nothing good ever fires on all _seven_ cylinders.

 - **License:** apache-2.0
 - **Finetuned from model :** unsloth/Mistral-Nemo-Base-2407
+**Standard disclaimer:** This is me teaching myself the basics of fine-tuning, with notes extensively borrowed from [MN-12B-Celeste-V1.9](https://huggingface.co/nothingiisreal/MN-12B-Celeste-V1.9). Huge props to [nothingisreal](https://huggingface.co/nothingiisreal) for posting their process and making me think this was even possible for a little fish like me.
 The aim here is for a solid RP/storywriting model that will fit in 16GB of VRAM with a decent amount of context (> 16K).
   - 2K of Claude instruct, lightly curated & de-clauded
   - 2K of curated Falling through the Skies
   - 2K of curated/lightly de-ministrated C2 chat
+- Trained on a single 80GB A100 from runpod.io, with batch size of 8 (up from 2 on A100 40G), so far less steps involved. Took about 7.5hrs to run.
 - And remember kids, water is wet and fish are moist.
 I pulled v7 because I honestly don't think it's as good as v6, and don't want folks to get the wrong idea that it's better just because the version number is higher. Besides, nothing good ever fires on all _seven_ cylinders.