Certainly! Below is a draft for the README of your Hugging Face repository containing the QLoRA adapters. This README is structured to provide clear and concise information about the adapters, their purpose, and how to use them.
FineLlama-3.2-3B-Instruct-ead QLoRA Adapters
This repository contains the QLoRA (Quantized Low-Rank Adaptation) adapters for the FineLlama-3.2-3B-Instruct-ead model. These adapters are designed to be used with the base meta-llama/Llama-3.2-3B-Instruct model to enable efficient fine-tuning for generating EAD (Encoded Archival Description) XML format for archival records.
Overview
The QLoRA adapters were trained using Parameter Efficient Fine-Tuning (PEFT) with LoRA (Low-Rank Adaptation) on the Geraldine/Ead-Instruct-38k dataset. This approach allows for memory-efficient fine-tuning while maintaining high performance for the task of generating EAD/XML-compliant archival descriptions.
Key Features
- Efficient Fine-Tuning: Uses 4-bit quantization and LoRA to reduce memory usage.
- Compatibility: Designed to work with the base
meta-llama/Llama-3.2-3B-Instruct
model. - Specialization: Optimized for generating EAD/XML archival metadata.
Adapter Details
Training Configuration
- Quantization: 4-bit quantization using
bitsandbytes
.- Quantization Type:
nf4
- Double Quantization: Enabled
- Compute Dtype:
bfloat16
- Quantization Type:
- LoRA Configuration:
- Rank (
r
): 256 - Alpha (
alpha
): 128 - Dropout: 0.05
- Target Modules: All linear layers
- Rank (
- Training Parameters:
- Epochs: 3
- Batch Size: 3
- Gradient Accumulation Steps: 2
- Learning Rate: 2e-4
- Warmup Ratio: 0.03
- Max Sequence Length: 4096
- Scheduler: Constant
Training Infrastructure
- Libraries:
transformers
,peft
,trl
- Mixed Precision:
FP16/BF16
(based on hardware support) - Optimizer:
fused adamw
Usage
To use the QLoRA adapters, you need to load the base model and apply the adapters using the peft
library.
Installation
pip install transformers torch bitsandbytes peft
Loading the Model with Adapters
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel, PeftConfig
import torch
# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
# Load the base model
base_model_name = "meta-llama/Llama-3.2-3B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
base_model_name,
quantization_config=bnb_config,
torch_dtype="auto",
device_map="auto"
)
# Load the QLoRA adapters
adapter_model_name = "Geraldine/FineLlama-3.2-3B-Instruct-ead-Adapters"
model = PeftModel.from_pretrained(model, adapter_model_name)
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
Example Usage
messages = [
{"role": "system", "content": "You are an expert in EAD/XML generation for archival records metadata."},
{"role": "user", "content": "Generate a minimal and compliant <eadheader> template with all required EAD/XML tags"},
]
inputs = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True
).to(model.device)
outputs = model.generate(inputs, max_new_tokens=4096, use_cache=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Limitations
- The adapters are specifically trained for EAD/XML generation and may not generalize well to other tasks.
- Performance depends on the quality and specificity of the input prompts.
- The maximum sequence length is limited to 4096 tokens.
Citation
If you use these adapters in your work, please cite the base model and this repository:
@misc{ead-llama-adapters,
author = {Géraldine Geoffroy},
title = {FineLlama-3.2-3B-Instruct-ead QLoRA Adapters},
year = {2024},
publisher = {HuggingFace},
journal = {HuggingFace Repository},
howpublished = {\url{https://huggingface.co/Geraldine/qlora-FineLlama-3.2-3B-Instruct-ead}}
}
License
The adapters are subject to the same license as the base meta-llama/Llama-3.2-3B-Instruct model. Please refer to Meta's LLaMa license for usage terms and conditions.