McGill-DMaS
/

DMaS-LLaMa-Lite-step-5.1k

Model card Files Files and versions Community

DMaS-LLaMa-Lite-step-5.1k / README.md

MilesQLi's picture

Update README.md

80835dd verified 10 days ago

|

history blame contribute delete

2.69 kB

	---
	license: apache-2.0
	datasets:
	- HuggingFaceFW/fineweb-edu
	---
	Here is the draft for the `README.md` file for the McGill-DMaS/DMaS-LLaMa-Lite-step-5.1k model card on Huggingface:

	---

	# DMaS-LLaMa-Lite-step-5.1k

	This repository provides access to DMaS-LLaMa-Lite-step-5.1k, a 1.7-billion-parameter language model based on the LLaMa architecture. The model has been trained from scratch as part of the DMaS-LLaMa-Lite project using approximately 20 billion tokens of high-quality educational content.

	## Model Overview

	- Architecture: LLaMa-based
	- Parameters: 1.7B (36 layers, 32 attention heads, RMSNorm)
	- Tokenizer: GPT-2 tokenizer
	- Training Data: FineWeb-Edu subset (educational text)
	- Training Steps: 5,100
	- Optimizer: AdamW with linear warmup and decay
	- Hardware: Trained on 1-2 RTX A6000 GPUs with PyTorch DDP
	- Dataset Source: [FineWeb-Edu Dataset](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)

	The training process emphasizes qualitative improvements in coherence, fluency, and factual grounding, demonstrating competitive results even with fewer tokens compared to larger-scale models.


	This checkpoint represents the model's state at 5,100 training steps. Validation loss and downstream performance benchmarks demonstrate notable early improvements in text fluency and alignment with prompts.


	## Training Code

	The training script, including configurations and instructions, is open-sourced and available here:
	📄 [DMaS-LLaMa-Lite Training Code](https://github.com/McGill-DMaS/DMaS-LLaMa-Lite-Training-Code)

	## Usage

	You can load the model with Hugging Face Transformers library:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "McGill-DMaS/DMaS-LLaMa-Lite-step-5.1k"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)

	inputs = tokenizer("The Pyramids of Giza in Egypt are some of the oldest man-made structures in the world.", return_tensors="pt")
	outputs = model.generate(**inputs, max_length=50)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Citation

	If you use this model or its training insights in your work, please cite the following [paper](https://arxiv.org/abs/2412.13335):

	```bibtex
	@article{li2024effectiveness,
	title={Experience of Training a 1.7B-Parameter LLaMa Model From Scratch},
	author={Li, Miles Q and Fung, Benjamin and Huang, Shih-Chia},
	journal={arXiv preprint arXiv:2412.13335},
	year={2024}
	}
	```

	## License

	This model and code are released under the Apache License 2.0. Please check the respective repositories for detailed terms.