Pinkstack commited on
Commit
bb3cc24
·
verified ·
1 Parent(s): 04acf22

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -13,6 +13,8 @@ datasets:
13
  - boatbomber/roblox-info-dump
14
  - wikimedia/wikipedia
15
  pipeline_tag: text-generation
 
 
16
  ---
17
 
18
  ![Thumbnail](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/GeIinCTOzBfsgiqwQlKUY.png)
@@ -33,7 +35,7 @@ This model was continually pre-trained in 3 stages.
33
  !stage 3 and onwards were with added layers. the model started with 16 layers, then we merged another 20 to make the model bigger and deeper!
34
  - Stage 3: Training on a mix of Pinkstack/roblox-luau-corpus-text & Roblox/luau_corpus + wikimedia/wikipedia with rope scaling set to 8, aka **32768** tokens of context. We mixed the wikimedia/wikipedia to hopefully improve the general text and knowledge of the model.
35
 
36
- In total, the model was continually pre-trained on up to 1.3B tokens.
37
 
38
  # print("Use cases")
39
  As this is a base model, there isn't much to do with it currently. But, you can fine-tune it on your own datasets to turn it into an instruct - chat type model.
 
13
  - boatbomber/roblox-info-dump
14
  - wikimedia/wikipedia
15
  pipeline_tag: text-generation
16
+ base_model:
17
+ - allenai/OLMo-2-0425-1B
18
  ---
19
 
20
  ![Thumbnail](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/GeIinCTOzBfsgiqwQlKUY.png)
 
35
  !stage 3 and onwards were with added layers. the model started with 16 layers, then we merged another 20 to make the model bigger and deeper!
36
  - Stage 3: Training on a mix of Pinkstack/roblox-luau-corpus-text & Roblox/luau_corpus + wikimedia/wikipedia with rope scaling set to 8, aka **32768** tokens of context. We mixed the wikimedia/wikipedia to hopefully improve the general text and knowledge of the model.
37
 
38
+ In total, the model was continually pre-trained on up to 1.3B tokens, final loss of **1.916400**.
39
 
40
  # print("Use cases")
41
  As this is a base model, there isn't much to do with it currently. But, you can fine-tune it on your own datasets to turn it into an instruct - chat type model.