metadata

license: apache-2.0
datasets:
  - HuggingFaceFW/fineweb-edu

Here is the draft for the README.md file for the McGill-DMaS/DMaS-LLaMa-Lite-step-5.1k model card on Huggingface:

DMaS-LLaMa-Lite-step-5.1k

This repository provides access to DMaS-LLaMa-Lite-step-5.1k, a 1.7-billion-parameter language model based on the LLaMa architecture. The model has been trained from scratch as part of the DMaS-LLaMa-Lite project using approximately 20 billion tokens of high-quality educational content.

Model Overview

Architecture: LLaMa-based
Parameters: 1.7B (36 layers, 32 attention heads, RMSNorm)
Tokenizer: GPT-2 tokenizer
Training Data: FineWeb-Edu subset (educational text)
Training Steps: 5,100
Optimizer: AdamW with linear warmup and decay
Hardware: Trained on 1-2 RTX A6000 GPUs with PyTorch DDP
Dataset Source: FineWeb-Edu Dataset

The training process emphasizes qualitative improvements in coherence, fluency, and factual grounding, demonstrating competitive results even with fewer tokens compared to larger-scale models.

This checkpoint represents the model's state at 5,100 training steps. Validation loss and downstream performance benchmarks demonstrate notable early improvements in text fluency and alignment with prompts.

Training Code

The training script, including configurations and instructions, is open-sourced and available here:
📄 DMaS-LLaMa-Lite Training Code

Usage

You can load the model with Hugging Face Transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "McGill-DMaS/DMaS-LLaMa-Lite-step-5.1k"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

inputs = tokenizer("The Pyramids of Giza in Egypt are some of the oldest man-made structures in the world.", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

If you use this model or its training insights in your work, please cite the following paper:

@article{li2024effectiveness,
  title={Experience of Training a 1.7B-Parameter LLaMa Model From Scratch},
  author={Li, Miles Q and Fung, Benjamin and Huang, Shih-Chia},
  journal={arXiv preprint arXiv:2412.13335},
  year={2024}
}

License

This model and code are released under the Apache License 2.0. Please check the respective repositories for detailed terms.