--- library_name: transformers license: llama3.2 --- Certainly! Below is a draft for the README of your Hugging Face repository containing the QLoRA adapters. This README is structured to provide clear and concise information about the adapters, their purpose, and how to use them. --- # FineLlama-3.2-3B-Instruct-ead QLoRA Adapters This repository contains the QLoRA (Quantized Low-Rank Adaptation) adapters for the **FineLlama-3.2-3B-Instruct-ead** model. These adapters are designed to be used with the base [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) model to enable efficient fine-tuning for generating EAD (Encoded Archival Description) XML format for archival records. ## Overview The QLoRA adapters were trained using **Parameter Efficient Fine-Tuning (PEFT)** with LoRA (Low-Rank Adaptation) on the [Geraldine/Ead-Instruct-38k](https://huggingface.co/datasets/Geraldine/Ead-Instruct-38k) dataset. This approach allows for memory-efficient fine-tuning while maintaining high performance for the task of generating EAD/XML-compliant archival descriptions. ### Key Features - **Efficient Fine-Tuning**: Uses 4-bit quantization and LoRA to reduce memory usage. - **Compatibility**: Designed to work with the base `meta-llama/Llama-3.2-3B-Instruct` model. - **Specialization**: Optimized for generating EAD/XML archival metadata. --- ## Adapter Details ### Training Configuration - **Quantization**: 4-bit quantization using `bitsandbytes`. - Quantization Type: `nf4` - Double Quantization: Enabled - Compute Dtype: `bfloat16` - **LoRA Configuration**: - Rank (`r`): 256 - Alpha (`alpha`): 128 - Dropout: 0.05 - Target Modules: All linear layers - **Training Parameters**: - Epochs: 3 - Batch Size: 3 - Gradient Accumulation Steps: 2 - Learning Rate: 2e-4 - Warmup Ratio: 0.03 - Max Sequence Length: 4096 - Scheduler: Constant ### Training Infrastructure - Libraries: `transformers`, `peft`, `trl` - Mixed Precision: `FP16/BF16` (based on hardware support) - Optimizer: `fused adamw` --- ## Usage To use the QLoRA adapters, you need to load the base model and apply the adapters using the `peft` library. ### Installation ```bash pip install transformers torch bitsandbytes peft ``` ### Loading the Model with Adapters ```python from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig from peft import PeftModel, PeftConfig import torch # Configure 4-bit quantization bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) # Load the base model base_model_name = "meta-llama/Llama-3.2-3B-Instruct" model = AutoModelForCausalLM.from_pretrained( base_model_name, quantization_config=bnb_config, torch_dtype="auto", device_map="auto" ) # Load the QLoRA adapters adapter_model_name = "Geraldine/FineLlama-3.2-3B-Instruct-ead-Adapters" model = PeftModel.from_pretrained(model, adapter_model_name) # Load the tokenizer tokenizer = AutoTokenizer.from_pretrained(base_model_name) ``` ### Example Usage ```python messages = [ {"role": "system", "content": "You are an expert in EAD/XML generation for archival records metadata."}, {"role": "user", "content": "Generate a minimal and compliant template with all required EAD/XML tags"}, ] inputs = tokenizer.apply_chat_template( messages, return_tensors="pt", add_generation_prompt=True ).to(model.device) outputs = model.generate(inputs, max_new_tokens=4096, use_cache=True) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## Limitations - The adapters are specifically trained for EAD/XML generation and may not generalize well to other tasks. - Performance depends on the quality and specificity of the input prompts. - The maximum sequence length is limited to 4096 tokens. --- ## Citation If you use these adapters in your work, please cite the base model and this repository: ```bibtex @misc{ead-llama-adapters, author = {GĂ©raldine Geoffroy}, title = {FineLlama-3.2-3B-Instruct-ead QLoRA Adapters}, year = {2024}, publisher = {HuggingFace}, journal = {HuggingFace Repository}, howpublished = {\url{https://huggingface.co/Geraldine/qlora-FineLlama-3.2-3B-Instruct-ead}} } ``` --- ## License The adapters are subject to the same license as the base [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) model. Please refer to Meta's LLaMa license for usage terms and conditions.