Gemma 2 2B Pretrained
Collection
A collection of models
•
2 items
•
Updated
This is a continued pretraining of the Gemma-2-2b-it model on Luganda text data. The model has been pretrained on Wikipedia Luganda articles to adapt it for Luganda language understanding and generation.
The model was trained using the following configuration:
The training data was processed using the following template:
Ekyawandiikibwa kya Wikipedia
### Omutwe: {title}
### Akawayiro:
{text}
This repository contains multiple checkpoints from the pretraining process:
from unsloth import FastLanguageModel
import torch
# Load the model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Bronsn/gemma-2-2b-it-pretrained",
max_seq_length = 2048,
dtype = None, # Auto-detect
load_in_4bit = True,
)
# Example usage
text = "Ekyawandiikibwa kya Wikipedia\n### Omutwe: Uganda\n\n### Akawayiro:\n"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
If you use this model, please cite:
@misc{luganda-gemma-pretrained,
author = {Bronsn},
title = {Gemma-2-2b-it Pretrained for Luganda},
year = {2025},
publisher = {HuggingFace}
}
This model inherits the licensing terms from the base Gemma-2-2b-it model. For more details, please refer to Gemma's license.