Pinkstack
/

Luau-coder-v2-3B-base-32k

Text Generation

text-generation-inference

Model card Files Files and versions

Pinkstack commited on Jul 20

Commit

bb3cc24

·

verified ·

1 Parent(s): 04acf22

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -13,6 +13,8 @@ datasets:
 - boatbomber/roblox-info-dump
 - wikimedia/wikipedia
 pipeline_tag: text-generation
 ---
 ![Thumbnail](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/GeIinCTOzBfsgiqwQlKUY.png)
@@ -33,7 +35,7 @@ This model was continually pre-trained in 3 stages.
 !stage 3 and onwards were with added layers. the model started with 16 layers, then we merged another 20 to make the model bigger and deeper!
 - Stage 3: Training on a mix of Pinkstack/roblox-luau-corpus-text & Roblox/luau_corpus + wikimedia/wikipedia with rope scaling set to 8, aka **32768** tokens of context. We mixed the wikimedia/wikipedia to hopefully improve the general text and knowledge of the model.
-In total, the model was continually pre-trained on up to 1.3B tokens.
 # print("Use cases")
 As this is a base model, there isn't much to do with it currently. But, you can fine-tune it on your own datasets to turn it into an instruct - chat type model.

 - boatbomber/roblox-info-dump
 - wikimedia/wikipedia
 pipeline_tag: text-generation
+base_model:
+- allenai/OLMo-2-0425-1B
 ---
 ![Thumbnail](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/GeIinCTOzBfsgiqwQlKUY.png)
 !stage 3 and onwards were with added layers. the model started with 16 layers, then we merged another 20 to make the model bigger and deeper!
 - Stage 3: Training on a mix of Pinkstack/roblox-luau-corpus-text & Roblox/luau_corpus + wikimedia/wikipedia with rope scaling set to 8, aka **32768** tokens of context. We mixed the wikimedia/wikipedia to hopefully improve the general text and knowledge of the model.
+In total, the model was continually pre-trained on up to 1.3B tokens, final loss of **1.916400**.
 # print("Use cases")
 As this is a base model, there isn't much to do with it currently. But, you can fine-tune it on your own datasets to turn it into an instruct - chat type model.