frankmorales2020
/

Mistral-7B-text-to-sql-flash-attention-2-dataeval

Generated from Trainer

Model card Files Files and versions Community

frankmorales2020 commited on Jun 26, 2024

Commit

c125a1e

·

verified ·

1 Parent(s): f957795

Update README.md

Files changed (1) hide show

README.md +14 -1

README.md CHANGED Viewed

@@ -20,8 +20,21 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) on the generator dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.4605
 Dataset : [b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context)
 ## Model description
@@ -59,7 +72,7 @@ args = TrainingArguments(
     gradient_accumulation_steps=8,      #2  # number of steps before performing a backward/update pass
     gradient_checkpointing=True,            # use gradient checkpointing to save memory
     optim="adamw_torch_fused",              # use fused adamw optimizer
-    logging_steps=10,                       # log every 10 steps
     #save_strategy="epoch",                  # save checkpoint every epoch
     learning_rate=2e-4,                     # learning rate, based on QLoRA paper
     bf16=True,                              # use bfloat16 precision

 This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) on the generator dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.4605
+Perplexity of 10.40
+Perplexity Article: https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf
+https://medium.com/@AyushmanPranav/perplexity-calculation-in-nlp-0699fbda4594
+ The perplexity of 10.40 achieved on the dataset indicates that the fine-tuned Mistral-7B model reasonably understands natural language and SQL syntax.
+ However, further evaluation using task-specific metrics is necessary to assess the model's effectiveness in real-world scenarios.
+ By combining quantitative metrics like perplexity with qualitative analysis of generated queries,
+ we can comprehensively understand the model's strengths and weaknesses, ultimately
+ leading to improved performance and more reliable text-to-SQL translation capabilities.
 Dataset : [b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context)
 ## Model description
     gradient_accumulation_steps=8,      #2  # number of steps before performing a backward/update pass
     gradient_checkpointing=True,            # use gradient checkpointing to save memory
     optim="adamw_torch_fused",              # use fused adamw optimizer
+    logging_steps=10,                       # log every ten steps
     #save_strategy="epoch",                  # save checkpoint every epoch
     learning_rate=2e-4,                     # learning rate, based on QLoRA paper
     bf16=True,                              # use bfloat16 precision