frankmorales2020

Update README.md

80050e7 verified 6 months ago

5.54 kB

	---
	base_model: mistralai/Mistral-7B-Instruct-v0.3
	datasets:
	- generator
	library_name: peft
	license: apache-2.0
	tags:
	- trl
	- sft
	- generated_from_trainer
	model-index:
	- name: Mistral-7B-text-to-sql-flash-attention-2-dataeval
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Mistral-7B-text-to-sql-flash-attention-2-dataeval

	This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) on the generator dataset.
	It achieves the following results on the evaluation set:

	- Loss: 0.4605

	Perplexity of 10.40

	Perplexity: Perplexity is a measure of how uncertain or surprised the model is about its predictions.
	It's derived from the probabilities the model assigns to different words or tokens.

	Perplexity Article: https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf
	https://medium.com/@AyushmanPranav/perplexity-calculation-in-nlp-0699fbda4594

	The perplexity of 10.40 achieved on the dataset indicates that the fine-tuned Mistral-7B model reasonably understands natural language and SQL syntax.
	However, further evaluation using task-specific metrics is necessary to assess the model's effectiveness in real-world scenarios.
	By combining quantitative metrics like perplexity with qualitative analysis of generated queries,
	we can comprehensively understand the model's strengths and weaknesses, ultimately
	leading to improved performance and more reliable text-to-SQL translation capabilities.


	Dataset : [b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context)

	## Model description

	Article: https://medium.com/@frankmorales_91352/fine-tuning-the-llm-mistral-7b-instruct-v0-3-249c1814ceaf

	## Training and evaluation data

	Fine Tuning and Evaluation: https://github.com/frank-morales2020/MLxDL/blob/main/FineTuning_LLM_Mistral_7B_Instruct_v0_1_for_text_to_SQL_EVALDATA.ipynb

	Evaluation: https://github.com/frank-morales2020/MLxDL/blob/main/Evaluator_Mistral_7B_text_to_sql.ipynb

	Evaluation article with Chromadb: https://medium.com/@frankmorales_91352/a-comprehensive-evaluation-of-a-fine-tuned-text-to-sql-model-from-code-to-results-with-7ea59943b0a1

	Evaluation article with Chromadb, PostgreSQL and the “gretelai/synthetic_text_to_sql” dataset:
	https://medium.com/@frankmorales_91352/evaluating-the-performance-of-a-fine-tuned-text-to-sql-model-6b7d61dcfef5
	The article discusses evaluating this fine-tuned text-to-SQL model, a type of artificial intelligence
	that translates natural language into SQL queries.

	The model was trained on the "b-mc2/sql-create-context" dataset and
	evaluated using the "gretelai/synthetic_text_to_sql" dataset.

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 3
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 24
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: constant
	- lr_scheduler_warmup_ratio: 0.03
	- lr_scheduler_warmup_steps: 15
	- num_epochs: 3

	from transformers import TrainingArguments

	args = TrainingArguments(
	output_dir="Mistral-7B-text-to-sql-flash-attention-2-dataeval",

	num_train_epochs=3, # number of training epochs
	per_device_train_batch_size=3, # batch size per device during training
	gradient_accumulation_steps=8, #2 # number of steps before performing a backward/update pass
	gradient_checkpointing=True, # use gradient checkpointing to save memory
	optim="adamw_torch_fused", # use fused adamw optimizer
	logging_steps=10, # log every ten steps
	#save_strategy="epoch", # save checkpoint every epoch
	learning_rate=2e-4, # learning rate, based on QLoRA paper
	bf16=True, # use bfloat16 precision
	tf32=True, # use tf32 precision
	max_grad_norm=0.3, # max gradient norm based on QLoRA paper
	warmup_ratio=0.03, # warmup ratio based on QLoRA paper
	weight_decay=0.01,
	lr_scheduler_type="constant", # use constant learning rate scheduler
	push_to_hub=True, # push model to hub
	report_to="tensorboard", # report metrics to tensorboard
	hub_token=access_token_write, # Add this line
	load_best_model_at_end=True,
	logging_dir="/content/drive/MyDrive/model/Mistral-7B-text-to-sql-flash-attention-2-dataeval/logs",
	evaluation_strategy="steps",
	eval_steps=10,
	save_strategy="steps",
	save_steps=10,
	metric_for_best_model = "loss",
	warmup_steps=15,

	)

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 1.8612 \| 0.4020 \| 10 \| 0.6092 \|
	\| 0.5849 \| 0.8040 \| 20 \| 0.5307 \|
	\| 0.4937 \| 1.2060 \| 30 \| 0.4887 \|
	\| 0.4454 \| 1.6080 \| 40 \| 0.4670 \|
	\| 0.425 \| 2.0101 \| 50 \| 0.4544 \|
	\| 0.3498 \| 2.4121 \| 60 \| 0.4717 \|
	\| 0.3439 \| 2.8141 \| 70 \| 0.4605 \|


	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.41.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1