--- library_name: transformers tags: [] --- ## Model Description Llama-3.2-1B-finetuned-generalQA-peft-4bit is a fine-tuned version of the Llama-3.2-1B model, specialized for general question-answering tasks. The model has been fine-tuned using Low-Rank Adaptation (LoRA) with 4-bit quantization, making it efficient for deployment on resource-constrained hardware. Model Architecture Base Model: Llama-3.2-1B Parameters: Approximately 1 Billion Quantization: 4-bit using the bitsandbytes library Fine-tuning Method: PEFT with LoRA ## Training Data The model was fine-tuned on the Databricks Dolly 15k Subset for General QA dataset. This dataset is a subset focusing on general question-answering tasks, derived from the larger Databricks Dolly 15k dataset. ### Training Procedure Fine-tuning Configuration: LoRA Rank (r): 8 LoRA Alpha: 16 LoRA Dropout: 0.5 Number of Epochs: 30 Batch Size: 2 (per device) Learning Rate: 2e-5 Evaluation Strategy: Evaluated at each epoch Optimizer: AdamW Mixed Precision: FP16 Hardware Used: [Specify hardware if known, e.g., "Single NVIDIA A100 GPU"] Libraries: transformers datasets peft bitsandbytes trl evaluate ## Intended Use The model is intended for generating informative answers to general questions. It can be integrated into applications such as chatbots, virtual assistants, educational tools, and information retrieval systems. ## Limitations and Biases Knowledge Cutoff: The model's knowledge is limited to the data it was trained on. It may not have information on events or developments that occurred after the dataset was created. Accuracy: While the model strives to provide accurate answers, it may occasionally produce incorrect or nonsensical responses. Always verify critical information from reliable sources. Biases: The model may inherit biases present in the training data. Users should be cautious and critically evaluate the model's outputs, especially in sensitive contexts. ## Acknowledgements Base Model: Meta AI's Llama-3.2-1B Dataset: Databricks Dolly 15k Subset for General QA Libraries Used:
  • Transformers
  • PEFT
  • TRL
  • BitsAndBytes
  • ## How to Use ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel, PeftConfig peft_model_id = "Chryslerx10/Llama-3.2-1B-finetuned-generalQA-peft-4bit" config = PeftConfig.from_pretrained(peft_model_id, device_map='auto') model = AutoModelForCausalLM.from_pretrained( config.base_model_name_or_path, device_map='auto', return_dict=True ) tokenizer = AutoTokenizer.from_pretrained(peft_model_id) tokenizer.pad_token = tokenizer.eos_token peft_loaded_model = PeftModel.from_pretrained(model, peft_model_id, device_map='auto') ``` ## Inference the model ```python def create_chat_template(question, context): text = ( "[Instruction] You are a question-answering agent which answers the question based on the related reviews. " "If related reviews are not provided, you can generate the answer based on the question.\n" f"[Question] {question}\n" "[Related Reviews] {context}\n" "[Answer] " ) return text def generate_response(question, context): text = create_chat_template(question, context) inputs = tokenizer([text], return_tensors='pt', padding=True, truncation=True).to(device) config = GenerationConfig( max_length=256, temperature=0.5, top_k=5, top_p=0.95, repetition_penalty=1.2, do_sample=True, penalty_alpha=0.6 ) response = model.generate(**inputs, generation_config=config) output = tokenizer.decode(response[0], skip_special_tokens=True) return output # Example usage question = "Explain the process of photosynthesis." response = generate_response(question) print(response) ```