frankmorales2020 commited on
Commit
c125a1e
1 Parent(s): f957795

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -20,8 +20,21 @@ should probably proofread and complete it, then remove this comment. -->
20
 
21
  This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) on the generator dataset.
22
  It achieves the following results on the evaluation set:
 
23
  - Loss: 0.4605
24
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  Dataset : [b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context)
26
 
27
  ## Model description
@@ -59,7 +72,7 @@ args = TrainingArguments(
59
  gradient_accumulation_steps=8, #2 # number of steps before performing a backward/update pass
60
  gradient_checkpointing=True, # use gradient checkpointing to save memory
61
  optim="adamw_torch_fused", # use fused adamw optimizer
62
- logging_steps=10, # log every 10 steps
63
  #save_strategy="epoch", # save checkpoint every epoch
64
  learning_rate=2e-4, # learning rate, based on QLoRA paper
65
  bf16=True, # use bfloat16 precision
 
20
 
21
  This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
+
24
  - Loss: 0.4605
25
 
26
+ Perplexity of 10.40
27
+
28
+ Perplexity Article: https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf
29
+ https://medium.com/@AyushmanPranav/perplexity-calculation-in-nlp-0699fbda4594
30
+
31
+ The perplexity of 10.40 achieved on the dataset indicates that the fine-tuned Mistral-7B model reasonably understands natural language and SQL syntax.
32
+ However, further evaluation using task-specific metrics is necessary to assess the model's effectiveness in real-world scenarios.
33
+ By combining quantitative metrics like perplexity with qualitative analysis of generated queries,
34
+ we can comprehensively understand the model's strengths and weaknesses, ultimately
35
+ leading to improved performance and more reliable text-to-SQL translation capabilities.
36
+
37
+
38
  Dataset : [b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context)
39
 
40
  ## Model description
 
72
  gradient_accumulation_steps=8, #2 # number of steps before performing a backward/update pass
73
  gradient_checkpointing=True, # use gradient checkpointing to save memory
74
  optim="adamw_torch_fused", # use fused adamw optimizer
75
+ logging_steps=10, # log every ten steps
76
  #save_strategy="epoch", # save checkpoint every epoch
77
  learning_rate=2e-4, # learning rate, based on QLoRA paper
78
  bf16=True, # use bfloat16 precision