metadata

tags:
  - text-generation-inference
  - transformers
  - unsloth
  - olmo2
license: apache-2.0
language:
  - en
datasets:
  - Pinkstack/roblox-luau-corpus-text
  - Roblox/luau_corpus
  - boatbomber/roblox-info-dump
  - wikimedia/wikipedia
pipeline_tag: text-generation

print("Before we start")

We are not related to Roblox in any way, any mention of Roblox is purely to help people understand what the model is about. As per the Roblox website, they use Meta's Llama 3 (we assume 70B) for their AI assistant. This model, while powerful, cannot come close to the performance of a 70B model.

print("Stages of pre-training")

This model was continually pre-trained in 3 stages.

Stage 1: Pre-training on the Pinkstack/roblox-luau-corpus-text & Roblox/luau_corpus on 4096 tokens (the maximum olmo 2 can usually reach)
Stage 2: Pre-training on the boatbomber/roblox-info-dump with rope scaling set to 4, so stage 2 was for expanding the context of the model to 16384.

!stage 3 and onwards were with added layers. the model started with 16 layers, then we merged another 20 to make the model bigger and deeper!

Stage 3: Training on a mix of Pinkstack/roblox-luau-corpus-text & Roblox/luau_corpus + wikimedia/wikipedia with rope scaling set to 8, aka 32768 tokens of context. We mixed the wikimedia/wikipedia to hopefully improve the general text and knowledge of the model.

In total, the model was continually pre-trained on up to 1.3B tokens.

print("Additional information")

This repo contains the stage 3 pre-trained/base model.

unsloth was used for training (https://unsloth.ai/)