File size: 6,548 Bytes
eef067e 84fd9ee eef067e 84fd9ee eef067e 84fd9ee afd4fb2 84fd9ee fbbe7d6 84fd9ee de9b21f 84fd9ee de9b21f 84fd9ee de9b21f 84fd9ee de9b21f 84fd9ee fbbe7d6 84fd9ee fbbe7d6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
---
license: mit
datasets:
- mlabonne/FineTome-100k
- efederici/capybara-claude-15k-ita
language:
- it
- en
library_name: transformers
pipeline_tag: text-generation
base_model: microsoft/Phi-3.5-mini-instruct
tags:
- trl
- phi3
- spectrum
---
<img src="./assets/phi_35_mini_ita.png" width="450"></img>
# Phi-3.5-mini-ITA
Fine-tuned version of [Microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) optimized for better performance in Italian.
๐น Small yet powerful model with 3.82 billion parameters
๐น Supports 128k context length
- [๐ฌ๐ฎ๐น Chat with the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/Phi-3.5-mini-ITA)
- [GGUF quants](https://huggingface.co/QuantFactory/Phi-3.5-mini-ITA-GGUF)
๐๏ธโโ๏ธ **Do you want to understand how the model was trained?**
Check out the [๐ full walkthrough article](https://huggingface.co/blog/anakin87/spectrum) and the accompanying [๐ป notebook](./notebooks/training.ipynb)
## ๐ Evaluation
*Open ITA LLM Leaderboard*
| Model | Parameters | Average | MMLU_IT | ARC_IT | HELLASWAG_IT |
| ------------------------------------- | ---------- | ------- | ------- | ------ | ------------ |
| **anakin87/Phi-3.5-mini-ITA** | **3.82 B** |**57.67** | 59.93 | 51.5 | 61.57 |
| meta-llama/Meta-Llama-3.1-8B-Instruct | 8.03 B | 56.97 | 58.43 | 48.42 | 64.07 |
| microsoft/Phi-3.5-mini-instruct | 3.82 B | 56.82 | 60.03 | 49.19 | 61.25 |
[Details](https://huggingface.co/spaces/mii-llm/open_ita_llm_leaderboard)
*Pinocchio ITA Leaderboard*
| Model | Parameters | Average |
| ------------------------------------- | ---------- | ------- |
| **anakin87/Phi-3.5-mini-ITA** | **3.82 B** | **57.95** |
| meta-llama/Meta-Llama-3.1-8B-Instruct | 8.03 B | 56.93 |
[Details](https://huggingface.co/spaces/mii-llm/pinocchio_ita_leaderboard)
## ๐ฎ Model in action
### Demo
[๐ฌ๐ฎ๐น Chat with the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/Phi-3.5-mini-ITA)
### Text generation with Transformers
The model is small, so it runs smoothly on Colab. It is also fine to load the model using quantization.
With `transformers==4.44.2`, `trust_remote_code=True` is needed to incorporate a minor bug fix in `Phi3ForCausalLM`.
Read [this discussion](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/discussions/9) for more details.
โก *The model is compatible with Flash Attention 2, which accelerates inference. To enable it, uncomment the `attn_implementation` parameter in the code snippet below.*
```python
# pip install transformers accelerate
import torch
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
model_id="anakin87/Phi-3.5-mini-ITA"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
# attn_implementation="flash_attention_2", # UNCOMMENT TO USE FLASH ATTENTION 2
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
user_input = "Puoi spiegarmi brevemente la differenza tra imperfetto e passato prossimo in italiano e quando si usano?"
messages = [{"role": "user", "content": user_input}]
outputs = pipe(user_input, max_new_tokens=500, do_sample=True, temperature=0.001)
print(outputs[0]["generated_text"])
```
Example output:
```
Certamente! Imperfetto e passato prossimo sono due tempi verbali in italiano che si riferiscono a azioni passate, ma hanno sfumature diverse.
Imperfetto:
- L'imperfetto รจ usato per descrivere azioni o situazioni passate che erano continue o ripetute nel tempo.
- Indica un'azione senza una fine specifica o un'azione che si svolgeva abitualmente.
- ร spesso usato per descrivere situazioni, condizioni o stati passati.
- Esempio: "Quando ero bambino, giocavo spesso nel parco."
Passato Prossimo:
- Il passato prossimo รจ usato per descrivere azioni passate che sono state completate o che hanno avuto una durata specifica.
- Indica un'azione che รจ avvenuta in un momento specifico nel passato.
- ร spesso usato per descrivere eventi o azioni che hanno una durata definita o che si sono svolte in un momento specifico.
- Esempio: "Ieri ho finito il libro."
In sintesi, l'imperfetto si usa per azioni continue o abituali nel passato, mentre il passato prossimo si usa per azioni completate o avvenute in un momento specifico nel passato.
```
### Build AI applications
You can use the model to create a variety of AI applications.
I recommend using the [๐๏ธ Haystack LLM framework](https://haystack.deepset.ai/) for orchestration.
(spoiler: I work on it and it is open-source ๐)
This model is compatible with [`HuggingFaceLocalGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalgenerator) and [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator) components.
You can also deploy the model with a TGI container and then use it with [`HuggingFaceAPIGenerator`](https://docs.haystack.deepset.ai/docs/huggingfaceapigenerator) and the related Chat Generator.
Some examples you can keep inspiration from:
- [RAG with local open models](https://haystack.deepset.ai/blog/guide-to-using-zephyr-with-haystack2)
- [Summarization from a Website](https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/hackernews-custom-component-rag.ipynb)
- [Multilingual RAG](https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/multilingual_rag_podcast.ipynb)
## ๐ง Training details
This model was fine-tuned using HF TRL.
It underwent 2 epochs of instruction fine-tuning on the [FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) and [Capybara-Claude-15k-ita](https://huggingface.co/datasets/efederici/capybara-claude-15k-ita) datasets. ๐ Thanks to the authors for providing these datasets.
I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623).
The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ๏ธ freeze the rest.
Training required about 14 hours on a single A6000 GPU.
**For complete training details**, check out the [๐ full walkthrough article](https://huggingface.co/blog/anakin87/spectrum) and the accompanying [๐ป notebook](./notebooks/training.ipynb). |