Update README.md

d480bbe verified 3 months ago

4.34 kB

	---
	library_name: transformers
	datasets:
	- bergr7f/databricks-dolly-15k-subset-general_qa
	language:
	- en
	base_model:
	- meta-llama/Llama-3.2-1B
	pipeline_tag: text-generation
	---

	## Model Description

	Llama-3.2-1B-finetuned-generalQA-peft-4bit is a fine-tuned version of the Llama-3.2-1B model, specialized for general question-answering tasks. The model has been fine-tuned using Low-Rank Adaptation (LoRA) with 4-bit quantization, making it efficient for deployment on resource-constrained hardware.
	Model Architecture

	Base Model: Llama-3.2-1B
	Parameters: Approximately 1 Billion
	Quantization: 4-bit using the bitsandbytes library
	Fine-tuning Method: PEFT with LoRA

	## Training Data

	The model was fine-tuned on the Databricks Dolly 15k Subset for General QA dataset. This dataset is a subset focusing on general question-answering tasks, derived from the larger Databricks Dolly 15k dataset.

	### Training Procedure

	Fine-tuning Configuration:
	LoRA Rank (r): 8
	LoRA Alpha: 16
	LoRA Dropout: 0.5
	Number of Epochs: 30
	Batch Size: 2 (per device)
	Learning Rate: 2e-5
	Evaluation Strategy: Evaluated at each epoch
	Optimizer: AdamW
	Mixed Precision: FP16
	Hardware Used: Single RTX 4070 8GB
	Libraries:
	transformers
	datasets
	peft
	bitsandbytes
	trl
	evaluate

	## Intended Use

	The model is intended for generating informative answers to general questions. It can be integrated into applications such as chatbots, virtual assistants, educational tools, and information retrieval systems.

	## Limitations and Biases

	Knowledge Cutoff: The model's knowledge is limited to the data it was trained on. It may not have information on events or developments that occurred after the dataset was created.
	Accuracy: While the model strives to provide accurate answers, it may occasionally produce incorrect or nonsensical responses. Always verify critical information from reliable sources.
	Biases: The model may inherit biases present in the training data. Users should be cautious and critically evaluate the model's outputs, especially in sensitive contexts.


	## Acknowledgements

	Base Model: <a href="https://huggingface.co/meta-llama/Llama-3.2-1B">Meta AI's Llama-3.2-1B </a>
	Dataset: <a href="https://huggingface.co/datasets/bergr7f/databricks-dolly-15k-subset-general_qa">Databricks Dolly 15k Subset for General QA</a>
	Libraries Used:
	<li>Transformers</li>
	<li>PEFT</li>
	<li>TRL</li>
	<li>BitsAndBytes</li>


	## How to Use
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel, PeftConfig

	peft_model_id = "Chryslerx10/Llama-3.2-1B-finetuned-generalQA-peft-4bit"
	config = PeftConfig.from_pretrained(peft_model_id, device_map='auto')

	model = AutoModelForCausalLM.from_pretrained(
	config.base_model_name_or_path,
	device_map='auto',
	return_dict=True
	)

	tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
	tokenizer.pad_token = tokenizer.eos_token

	peft_loaded_model = PeftModel.from_pretrained(model, peft_model_id, device_map='auto')
	```

	## Inference the model
	```python
	def create_chat_template(question, context):
	text = f"""
	[Instruction] You are a question-answering agent which answers the question based on the related reviews.
	If related reviews are not provided, you can generate the answer based on the question.\n
	[Question] {question}\n
	[Related Reviews] {context}\n
	[Answer]
	"""
	return text

	def generate_response(question, context):
	text = create_chat_template(question, context)
	inputs = tokenizer([text], return_tensors='pt', padding=True, truncation=True).to(device)

	config = GenerationConfig(
	max_length=256,
	temperature=0.5,
	top_k=5,
	top_p=0.95,
	repetition_penalty=1.2,
	do_sample=True,
	penalty_alpha=0.6
	)

	response = model.generate(**inputs, generation_config=config)
	output = tokenizer.decode(response[0], skip_special_tokens=True)
	return output

	# Example usage
	question = "Explain the process of photosynthesis."
	response = generate_response(question)
	print(response)
	```