Phi-3.5-mini-ITA / README.md
anakin87's picture
more eval + refinements to code example
de9b21f
metadata
license: mit
datasets:
  - mlabonne/FineTome-100k
  - efederici/capybara-claude-15k-ita
language:
  - it
  - en
library_name: transformers
pipeline_tag: text-generation
base_model: microsoft/Phi-3.5-mini-instruct
tags:
  - trl
  - phi3
  - spectrum

Phi-3.5-mini-ITA

Fine-tuned version of Microsoft/Phi-3.5-mini-instruct optimized for better performance in Italian.

๐Ÿ”น Small yet powerful model with 3.82 billion parameters ๐Ÿ”น Supports 128k context length

๐Ÿ‹๏ธโ€โ™‚๏ธ Do you want to understand how the model was trained? Check out the ๐Ÿ“– full walkthrough article and the accompanying ๐Ÿ’ป notebook

๐Ÿ† Evaluation

Open ITA LLM Leaderboard

Model Parameters Average MMLU_IT ARC_IT HELLASWAG_IT
anakin87/Phi-3.5-mini-ITA 3.82 B 57.67 59.93 51.5 61.57
meta-llama/Meta-Llama-3.1-8B-Instruct 8.03 B 56.97 58.43 48.42 64.07
microsoft/Phi-3.5-mini-instruct 3.82 B 56.82 60.03 49.19 61.25

Details

Pinocchio ITA Leaderboard

Model Parameters Average
anakin87/Phi-3.5-mini-ITA 3.82 B 57.95
meta-llama/Meta-Llama-3.1-8B-Instruct 8.03 B 56.93

Details

๐ŸŽฎ Model in action

Demo

๐Ÿ’ฌ๐Ÿ‡ฎ๐Ÿ‡น Chat with the model on Hugging Face Spaces

Text generation with Transformers

The model is small, so it runs smoothly on Colab. It is also fine to load the model using quantization.

With transformers==4.44.2, trust_remote_code=True is needed to incorporate a minor bug fix in Phi3ForCausalLM. Read this discussion for more details.

โšก The model is compatible with Flash Attention 2, which accelerates inference. To enable it, uncomment the attn_implementation parameter in the code snippet below.

# pip install transformers accelerate
import torch
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer

model_id="anakin87/Phi-3.5-mini-ITA"

model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    # attn_implementation="flash_attention_2",  # UNCOMMENT TO USE FLASH ATTENTION 2
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

user_input = "Puoi spiegarmi brevemente la differenza tra imperfetto e passato prossimo in italiano e quando si usano?"
messages = [{"role": "user", "content": user_input}]
outputs = pipe(user_input, max_new_tokens=500, do_sample=True, temperature=0.001)
print(outputs[0]["generated_text"])

Example output:

Certamente! Imperfetto e passato prossimo sono due tempi verbali in italiano che si riferiscono a azioni passate, ma hanno sfumature diverse.

Imperfetto:
- L'imperfetto รจ usato per descrivere azioni o situazioni passate che erano continue o ripetute nel tempo.
- Indica un'azione senza una fine specifica o un'azione che si svolgeva abitualmente.
- รˆ spesso usato per descrivere situazioni, condizioni o stati passati.
- Esempio: "Quando ero bambino, giocavo spesso nel parco."

Passato Prossimo:
- Il passato prossimo รจ usato per descrivere azioni passate che sono state completate o che hanno avuto una durata specifica.
- Indica un'azione che รจ avvenuta in un momento specifico nel passato.
- รˆ spesso usato per descrivere eventi o azioni che hanno una durata definita o che si sono svolte in un momento specifico.
- Esempio: "Ieri ho finito il libro."

In sintesi, l'imperfetto si usa per azioni continue o abituali nel passato, mentre il passato prossimo si usa per azioni completate o avvenute in un momento specifico nel passato.

Build AI applications

You can use the model to create a variety of AI applications.

I recommend using the ๐Ÿ—๏ธ Haystack LLM framework for orchestration. (spoiler: I work on it and it is open-source ๐Ÿ˜„)

This model is compatible with HuggingFaceLocalGenerator and HuggingFaceLocalChatGenerator components. You can also deploy the model with a TGI container and then use it with HuggingFaceAPIGenerator and the related Chat Generator.

Some examples you can keep inspiration from:

๐Ÿ”ง Training details

This model was fine-tuned using HF TRL. It underwent 2 epochs of instruction fine-tuning on the FineTome-100k and Capybara-Claude-15k-ita datasets. ๐Ÿ™ Thanks to the authors for providing these datasets.

I adopted a relatively new technique for parameter-efficient learning: Spectrum. The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest.

Training required about 14 hours on a single A6000 GPU.

For complete training details, check out the ๐Ÿ“– full walkthrough article and the accompanying ๐Ÿ’ป notebook.