|
--- |
|
tags: |
|
- model |
|
- checkpoints |
|
- translation |
|
- latin |
|
- english |
|
- mt5 |
|
- mistral |
|
- multilingual |
|
- NLP |
|
language: |
|
- en |
|
- la |
|
license: "cc-by-4.0" |
|
models: |
|
- mistralai/Mistral-7B-Instruct-v0.3 |
|
- google/mt5-small |
|
model_type: "mt5-small" |
|
training_epochs: 6 (initial pipeline), 30 (final pipeline with optimizations), 100 (fine-tuning on 4750 summaries) |
|
task_categories: |
|
- translation |
|
- summarization |
|
- multilingual-nlp |
|
task_ids: |
|
- en-la-translation |
|
- la-en-translation |
|
- text-generation |
|
pretty_name: "mT5-LatinSummarizerModel" |
|
storage: |
|
- git-lfs |
|
- huggingface-models |
|
size_categories: |
|
- 5GB<n<10GB |
|
--- |
|
# **mT5-LatinSummarizerModel: Fine-Tuned Model for Latin NLP** |
|
|
|
[](https://github.com/AxelDlv00/LatinSummarizer) |
|
[](https://huggingface.co/LatinNLP/LatinSummarizerModel) |
|
[](https://huggingface.co/datasets/LatinNLP/LatinSummarizerDataset) |
|
|
|
## **Overview** |
|
This repository contains the **trained checkpoints and tokenizer files** for the `mT5-LatinSummarizerModel`, which was fine-tuned to improve **Latin summarization and translation**. It is designed to: |
|
- Translate between **English and Latin**. |
|
- Summarize Latin texts effectively. |
|
- Leverage extractive and abstractive summarization techniques. |
|
- Utilize **curriculum learning** for improved training. |
|
|
|
## **Installation & Usage** |
|
To download and set up the models (mT5-small and Mistral-7B-Instruct), you can directly run: |
|
```bash |
|
bash install_large_models.sh |
|
``` |
|
|
|
## **Project Structure** |
|
``` |
|
. |
|
βββ final_pipeline (Trained for 30 light epochs with optimizations, and then finetuned on 100 on the small HQ summaries dataset) |
|
β βββ no_stanza |
|
β βββ with_stanza |
|
βββ initial_pipeline (Trained for 6 epochs without optimizations) |
|
β βββ mt5-small-en-la-translation-epoch5 |
|
βββ install_large_models.sh |
|
βββ README.md |
|
``` |
|
|
|
## **Training Methodology** |
|
We fine-tuned **mT5-small** in three phases: |
|
1. **Initial Training Pipeline (6 epochs)**: Used the full dataset without optimizations. |
|
2. **Final Training Pipeline (30 light epochs)**: Used **10% of training data per epoch** for efficiency. |
|
3. **Fine-Tuning (100 epochs)**: Focused on the **4750 high-quality summaries** for final optimization. |
|
|
|
#### **Training Configurations:** |
|
- **Hardware:** 16GB VRAM GPU (lab machines via SSH). |
|
- **Batch Size:** Adaptive due to GPU memory constraints. |
|
- **Gradient Accumulation:** Enabled for larger effective batch sizes. |
|
- **LoRA-based fine-tuning:** LoRA Rank 8, Scaling Factor 32. |
|
- **Dynamic Sequence Length Adjustment:** Increased progressively. |
|
- **Learning Rate:** `5 Γ 10^-4` with warm-up steps. |
|
- **Checkpointing:** Frequent saves to mitigate power outages. |
|
|
|
## **Evaluation & Results** |
|
We evaluated the model using **ROUGE, BERTScore, and BLEU/chrF scores**. |
|
|
|
| Metric | Before Fine-Tuning | After Fine-Tuning | |
|
|--------|-----------------|-----------------| |
|
| ROUGE-1 | 0.1675 | 0.2541 | |
|
| ROUGE-2 | 0.0427 | 0.0773 | |
|
| ROUGE-L | 0.1459 | 0.2139 | |
|
| BERTScore-F1 | 0.6573 | 0.7140 | |
|
|
|
- **chrF Score (enβla):** 33.60 (with Stanza tags) vs 18.03 BLEU (without Stanza). |
|
- **Summarization Density:** Maintained at ~6%. |
|
|
|
### **Observations:** |
|
- Pre-training on **extractive summaries** was crucial. |
|
- The model retained some **excessive extraction**, indicating room for further improvement. |
|
|
|
## **License** |
|
This model is released under **CC-BY-4.0**. |
|
|
|
## **Citation** |
|
```bibtex |
|
@misc{LatinSummarizerModel, |
|
author = {Axel Delaval, Elsa Lubek}, |
|
title = {Latin-English Summarization Model (mT5)}, |
|
year = {2025}, |
|
url = {https://huggingface.co/LatinNLP/LatinSummarizerModel} |
|
} |
|
``` |