shahidul034
/

KUETLLM_zephyr

Generated from Trainer

Model card Files Files and versions Community

KUETLLM_zephyr / README.md

shahidul034's picture

Update README.md

9bc7370 over 1 year ago

|

history blame contribute delete

2.7 kB

	---
	license: mit
	base_model: TheBloke/zephyr-7B-beta-GPTQ
	tags:
	- generated_from_trainer
	model-index:
	- name: KUETLLM_zephyr
	results: []
	---
	KUETLLM is a [zephyr7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) finetune, using a dataset with prompts and answers about Khulna University of Engineering and Technology.
	It was loaded in 8 bit quantization using [bitsandbytes](https://github.com/TimDettmers/bitsandbytes). [LORA](https://huggingface.co/docs/diffusers/main/en/training/lora) was used to finetune an adapter, which was leter merged with the base unquantized model.
	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# KUETLLM_zephyr

	This model is a fine-tuned version of [TheBloke/zephyr-7B-beta-GPTQ](https://huggingface.co/TheBloke/zephyr-7B-beta-GPTQ) on the None dataset.

	## Model description

	Below is the training configuarations for the finetuning process:
	```
	LoraConfig:
	r=16,
	lora_alpha=16,
	target_modules=["q_proj", "v_proj","k_proj","o_proj","gate_proj","up_proj","down_proj"],
	lora_dropout=0.05,
	bias="none",
	task_type="CAUSAL_LM"
	```
	```
	TrainingArguments:
	per_device_train_batch_size=12,
	gradient_accumulation_steps=1,
	optim='paged_adamw_8bit',
	learning_rate=5e-06 ,
	fp16=True,
	logging_steps=10,
	num_train_epochs = 1,
	output_dir=zephyr_lora_output,
	remove_unused_columns=False,
	```


	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 24
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 96
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 2
	- mixed_precision_training: Native AMP

	### Inference
	```
	def process_data_sample(example):
	processed_example = "<\|system\|>\nYou are a KUET authority managed chatbot, help users by answering their queries about KUET.\n<\|user\|>\n" + example + "\n<\|assistant\|>\n"
	return processed_example

	inp_str = process_data_sample("Tell me about KUET.")
	inputs = tokenizer(inp_str, return_tensors="pt")
	generation_config = GenerationConfig(
	do_sample=True,
	top_k=1,
	temperature=0.1,
	max_new_tokens=256,
	pad_token_id=tokenizer.eos_token_id
	)

	outputs = model.generate(**inputs, generation_config=generation_config)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```



	### Framework versions

	- Transformers 4.36.0.dev0
	- Pytorch 2.1.1+cu121
	- Datasets 2.15.0
	- Tokenizers 0.15.0