frankmorales2020
commited on
Commit
•
c125a1e
1
Parent(s):
f957795
Update README.md
Browse files
README.md
CHANGED
@@ -20,8 +20,21 @@ should probably proofread and complete it, then remove this comment. -->
|
|
20 |
|
21 |
This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) on the generator dataset.
|
22 |
It achieves the following results on the evaluation set:
|
|
|
23 |
- Loss: 0.4605
|
24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
Dataset : [b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context)
|
26 |
|
27 |
## Model description
|
@@ -59,7 +72,7 @@ args = TrainingArguments(
|
|
59 |
gradient_accumulation_steps=8, #2 # number of steps before performing a backward/update pass
|
60 |
gradient_checkpointing=True, # use gradient checkpointing to save memory
|
61 |
optim="adamw_torch_fused", # use fused adamw optimizer
|
62 |
-
logging_steps=10, # log every
|
63 |
#save_strategy="epoch", # save checkpoint every epoch
|
64 |
learning_rate=2e-4, # learning rate, based on QLoRA paper
|
65 |
bf16=True, # use bfloat16 precision
|
|
|
20 |
|
21 |
This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) on the generator dataset.
|
22 |
It achieves the following results on the evaluation set:
|
23 |
+
|
24 |
- Loss: 0.4605
|
25 |
|
26 |
+
Perplexity of 10.40
|
27 |
+
|
28 |
+
Perplexity Article: https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf
|
29 |
+
https://medium.com/@AyushmanPranav/perplexity-calculation-in-nlp-0699fbda4594
|
30 |
+
|
31 |
+
The perplexity of 10.40 achieved on the dataset indicates that the fine-tuned Mistral-7B model reasonably understands natural language and SQL syntax.
|
32 |
+
However, further evaluation using task-specific metrics is necessary to assess the model's effectiveness in real-world scenarios.
|
33 |
+
By combining quantitative metrics like perplexity with qualitative analysis of generated queries,
|
34 |
+
we can comprehensively understand the model's strengths and weaknesses, ultimately
|
35 |
+
leading to improved performance and more reliable text-to-SQL translation capabilities.
|
36 |
+
|
37 |
+
|
38 |
Dataset : [b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context)
|
39 |
|
40 |
## Model description
|
|
|
72 |
gradient_accumulation_steps=8, #2 # number of steps before performing a backward/update pass
|
73 |
gradient_checkpointing=True, # use gradient checkpointing to save memory
|
74 |
optim="adamw_torch_fused", # use fused adamw optimizer
|
75 |
+
logging_steps=10, # log every ten steps
|
76 |
#save_strategy="epoch", # save checkpoint every epoch
|
77 |
learning_rate=2e-4, # learning rate, based on QLoRA paper
|
78 |
bf16=True, # use bfloat16 precision
|