How can I fine-tune using lora? Is there a sample code?

#27
by sinchir0 - opened

How can I fine-tune using lora? Is there a sample code?

try Unsloth notebook (just google Unsloth Qwen notebook). Stella is technically a Qwen model right?

Sentence Transformers models probably shouldn't be finetuned like decoders via Unsloth.

I think you're best off looking at the ST documentation: https://sbert.net/docs/sentence_transformer/training_overview.html
And then adding a LoRA adapter with add_adapter: https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.add_adapter

E.g. from the v3.3.0 update: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.3.0

from sentence_transformers import SentenceTransformer

# 1. Load a model to finetune with 2. (Optional) model card data
model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    model_card_data=SentenceTransformerModelCardData(
        language="en",
        license="apache-2.0",
        model_name="all-MiniLM-L6-v2 adapter finetuned on GooAQ pairs",
    ),
)

# 2. Create a LoRA adapter for the model & add it
peft_config = LoraConfig(
    task_type=TaskType.FEATURE_EXTRACTION,
    inference_mode=False,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
)
model.add_adapter(peft_config)

# Proceed as usual... See https://sbert.net/docs/sentence_transformer/training_overview.html

Something to consider is that this model is trained with query prompts: https://huggingface.co/dunzhang/stella_en_1.5B_v5/blob/main/config_sentence_transformers.json#L8-L9

You can train with these by using the new prompts argument in the SentenceTransformerTrainingArguments: https://sbert.net/docs/package_reference/sentence_transformer/training_args.html
If your training data has e.g. 2 columns ("query", "answer" for example), then you can set prompts as a dictionary mapping column names to prompts, like

args = SentenceTransformerTrainingArguments(
    ...,
    prompts={
        "query": "Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: ",
    },
    ...,
)

You can also check out the v3.3.0 update for more details on training with prompts: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.3.0

There's a lot of example training datasets here if you'd like to use these to get started: https://huggingface.co/collections/sentence-transformers/embedding-model-datasets-6644d7a3673a511914aa7552

  • Tom Aarsen

Thank you for the detailed explanation! I understand it very well now.

Sign up or log in to comment