File size: 6,548 Bytes
eef067e
 
84fd9ee
 
 
 
 
 
eef067e
 
84fd9ee
 
 
 
 
eef067e
 
84fd9ee
 
 
 
 
afd4fb2
 
84fd9ee
fbbe7d6
 
 
 
 
84fd9ee
 
 
de9b21f
 
84fd9ee
 
 
 
 
 
de9b21f
 
 
 
 
 
 
 
 
 
 
84fd9ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
de9b21f
84fd9ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
de9b21f
84fd9ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fbbe7d6
84fd9ee
fbbe7d6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
license: mit
datasets:
- mlabonne/FineTome-100k
- efederici/capybara-claude-15k-ita
language:
- it
- en
library_name: transformers
pipeline_tag: text-generation
base_model: microsoft/Phi-3.5-mini-instruct
tags:
- trl
- phi3
- spectrum
---

<img src="./assets/phi_35_mini_ita.png" width="450"></img>
# Phi-3.5-mini-ITA

Fine-tuned version of [Microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) optimized for better performance in Italian.

๐Ÿ”น Small yet powerful model with 3.82 billion parameters
๐Ÿ”น Supports 128k context length

- [๐Ÿ’ฌ๐Ÿ‡ฎ๐Ÿ‡น Chat with the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/Phi-3.5-mini-ITA)
- [GGUF quants](https://huggingface.co/QuantFactory/Phi-3.5-mini-ITA-GGUF)
  
๐Ÿ‹๏ธโ€โ™‚๏ธ **Do you want to understand how the model was trained?**
Check out the [๐Ÿ“– full walkthrough article](https://huggingface.co/blog/anakin87/spectrum) and the accompanying [๐Ÿ’ป notebook](./notebooks/training.ipynb)

## ๐Ÿ† Evaluation

*Open ITA LLM Leaderboard*

| Model                                 | Parameters | Average | MMLU_IT | ARC_IT | HELLASWAG_IT |
| ------------------------------------- | ---------- | ------- | ------- | ------ | ------------ |
| **anakin87/Phi-3.5-mini-ITA**         | **3.82 B**     |**57.67**   | 59.93   | 51.5  | 61.57        |
| meta-llama/Meta-Llama-3.1-8B-Instruct | 8.03 B     | 56.97   | 58.43   | 48.42  | 64.07        |
| microsoft/Phi-3.5-mini-instruct       | 3.82 B     | 56.82   | 60.03   | 49.19  | 61.25        |

[Details](https://huggingface.co/spaces/mii-llm/open_ita_llm_leaderboard)

*Pinocchio ITA Leaderboard*

| Model                                 | Parameters | Average   | 
| ------------------------------------- | ---------- | -------   | 
| **anakin87/Phi-3.5-mini-ITA**         | **3.82 B** | **57.95** |
| meta-llama/Meta-Llama-3.1-8B-Instruct | 8.03 B     | 56.93     |

[Details](https://huggingface.co/spaces/mii-llm/pinocchio_ita_leaderboard)


## ๐ŸŽฎ Model in action
### Demo
[๐Ÿ’ฌ๐Ÿ‡ฎ๐Ÿ‡น Chat with the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/Phi-3.5-mini-ITA)

### Text generation with Transformers
The model is small, so it runs smoothly on Colab. It is also fine to load the model using quantization.

With `transformers==4.44.2`, `trust_remote_code=True` is needed to incorporate a minor bug fix in `Phi3ForCausalLM`.
Read [this discussion](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/discussions/9) for more details.

โšก *The model is compatible with Flash Attention 2, which  accelerates inference. To enable it, uncomment the `attn_implementation` parameter in the code snippet below.*

```python
# pip install transformers accelerate
import torch
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer

model_id="anakin87/Phi-3.5-mini-ITA"

model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    # attn_implementation="flash_attention_2",  # UNCOMMENT TO USE FLASH ATTENTION 2
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

user_input = "Puoi spiegarmi brevemente la differenza tra imperfetto e passato prossimo in italiano e quando si usano?"
messages = [{"role": "user", "content": user_input}]
outputs = pipe(user_input, max_new_tokens=500, do_sample=True, temperature=0.001)
print(outputs[0]["generated_text"])
```

Example output:
```
Certamente! Imperfetto e passato prossimo sono due tempi verbali in italiano che si riferiscono a azioni passate, ma hanno sfumature diverse.

Imperfetto:
- L'imperfetto รจ usato per descrivere azioni o situazioni passate che erano continue o ripetute nel tempo.
- Indica un'azione senza una fine specifica o un'azione che si svolgeva abitualmente.
- รˆ spesso usato per descrivere situazioni, condizioni o stati passati.
- Esempio: "Quando ero bambino, giocavo spesso nel parco."

Passato Prossimo:
- Il passato prossimo รจ usato per descrivere azioni passate che sono state completate o che hanno avuto una durata specifica.
- Indica un'azione che รจ avvenuta in un momento specifico nel passato.
- รˆ spesso usato per descrivere eventi o azioni che hanno una durata definita o che si sono svolte in un momento specifico.
- Esempio: "Ieri ho finito il libro."

In sintesi, l'imperfetto si usa per azioni continue o abituali nel passato, mentre il passato prossimo si usa per azioni completate o avvenute in un momento specifico nel passato.
```

### Build AI applications
You can use the model to create a variety of AI applications.

I recommend using the [๐Ÿ—๏ธ Haystack LLM framework](https://haystack.deepset.ai/) for orchestration.
(spoiler: I work on it and it is open-source ๐Ÿ˜„)

This model is compatible with [`HuggingFaceLocalGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalgenerator) and [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator) components.
You can also deploy the model with a TGI container and then use it with [`HuggingFaceAPIGenerator`](https://docs.haystack.deepset.ai/docs/huggingfaceapigenerator) and the related Chat Generator.

Some examples you can keep inspiration from:
- [RAG with local open models](https://haystack.deepset.ai/blog/guide-to-using-zephyr-with-haystack2)
- [Summarization from a Website](https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/hackernews-custom-component-rag.ipynb)
- [Multilingual RAG](https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/multilingual_rag_podcast.ipynb)


## ๐Ÿ”ง Training details
This model was fine-tuned using HF TRL.
It underwent 2 epochs of instruction fine-tuning on the [FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) and [Capybara-Claude-15k-ita](https://huggingface.co/datasets/efederici/capybara-claude-15k-ita) datasets. ๐Ÿ™ Thanks to the authors for providing these datasets.

I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623).
The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest.

Training required about 14 hours on a single A6000 GPU.

**For complete training details**, check out the [๐Ÿ“– full walkthrough article](https://huggingface.co/blog/anakin87/spectrum) and the accompanying [๐Ÿ’ป notebook](./notebooks/training.ipynb).