DeanGumas commited on Apr 7

Commit

a30a6cf

1 Parent(s): 271b076

first cut at fine-tuning with masked training, working but very very slow for inference

Browse files

Files changed (39) hide show

fine-tuned-model-8/checkpoint-118/README.md +202 -0
fine-tuned-model-8/checkpoint-118/adapter_config.json +39 -0
fine-tuned-model-8/checkpoint-118/adapter_model.safetensors +3 -0
fine-tuned-model-8/checkpoint-118/optimizer.pt +3 -0
fine-tuned-model-8/checkpoint-118/rng_state.pth +3 -0
fine-tuned-model-8/checkpoint-118/scheduler.pt +3 -0
fine-tuned-model-8/checkpoint-118/special_tokens_map.json +20 -0
fine-tuned-model-8/checkpoint-118/tokenizer.json +0 -0
fine-tuned-model-8/checkpoint-118/tokenizer_config.json +206 -0
fine-tuned-model-8/checkpoint-118/trainer_state.json +73 -0
fine-tuned-model-8/checkpoint-118/training_args.bin +3 -0
fine-tuned-model-8/checkpoint-177/README.md +202 -0
fine-tuned-model-8/checkpoint-177/adapter_config.json +39 -0
fine-tuned-model-8/checkpoint-177/adapter_model.safetensors +3 -0
fine-tuned-model-8/checkpoint-177/optimizer.pt +3 -0
fine-tuned-model-8/checkpoint-177/rng_state.pth +3 -0
fine-tuned-model-8/checkpoint-177/scaler.pt +3 -0
fine-tuned-model-8/checkpoint-177/scheduler.pt +3 -0
fine-tuned-model-8/checkpoint-177/special_tokens_map.json +26 -0
fine-tuned-model-8/checkpoint-177/tokenizer.json +0 -0
fine-tuned-model-8/checkpoint-177/tokenizer_config.json +206 -0
fine-tuned-model-8/checkpoint-177/trainer_state.json +88 -0
fine-tuned-model-8/checkpoint-177/training_args.bin +3 -0
fine-tuned-model-8/config.json +48 -0
fine-tuned-model-8/generation_config.json +7 -0
fine-tuned-model-8/model.safetensors +3 -0
fine-tuned-model-8/runs/Apr06_19-55-38_DESKTOP-SMJC97K/events.out.tfevents.1743994556.DESKTOP-SMJC97K.17072.0 +3 -0
fine-tuned-model-8/runs/Apr06_20-04-45_DESKTOP-SMJC97K/events.out.tfevents.1743995090.DESKTOP-SMJC97K.17072.1 +3 -0
fine-tuned-model-8/runs/Apr06_20-09-40_DESKTOP-SMJC97K/events.out.tfevents.1743995381.DESKTOP-SMJC97K.2292.0 +3 -0
fine-tuned-model-8/runs/Apr06_20-14-05_DESKTOP-SMJC97K/events.out.tfevents.1743995646.DESKTOP-SMJC97K.12712.0 +3 -0
fine-tuned-model-8/runs/Apr06_20-19-55_DESKTOP-SMJC97K/events.out.tfevents.1743995996.DESKTOP-SMJC97K.12928.0 +3 -0
fine-tuned-model-8/runs/Apr06_22-18-58_DESKTOP-SMJC97K/events.out.tfevents.1744003142.DESKTOP-SMJC97K.8492.0 +3 -0
fine-tuned-model-8/runs/Apr06_22-52-40_DESKTOP-SMJC97K/events.out.tfevents.1744005165.DESKTOP-SMJC97K.9844.0 +3 -0
fine-tuned-model-8/runs/Apr06_23-15-43_DESKTOP-SMJC97K/events.out.tfevents.1744006548.DESKTOP-SMJC97K.7572.0 +3 -0
fine-tuned-model-8/runs/Apr06_23-40-32_DESKTOP-SMJC97K/events.out.tfevents.1744008032.DESKTOP-SMJC97K.10676.0 +3 -0
fine-tuned-model-8/special_tokens_map.json +20 -0
fine-tuned-model-8/tokenizer.json +0 -0
fine-tuned-model-8/tokenizer_config.json +206 -0
finetune_model.ipynb +157 -549

fine-tuned-model-8/checkpoint-118/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: ./deepseek-coder-1.3b-instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.1

fine-tuned-model-8/checkpoint-118/adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "./deepseek-coder-1.3b-instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "k_proj",
+    "q_proj",
+    "down_proj",
+    "v_proj",
+    "o_proj",
+    "gate_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

fine-tuned-model-8/checkpoint-118/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1d9a812553f57bf6d7451ee9a58ce33104b1e5136ffe2f0cd207d34b9d14298e
+size 292359512

fine-tuned-model-8/checkpoint-118/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7e7e4beb78a641529e580de426ce262295f1c15b3e14f5a173d62d0e5de39870
+size 60247362

fine-tuned-model-8/checkpoint-118/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:40cb1869983733651486e797294b9d075e35ed911b745abc673b44d5ac187b23
+size 14244

fine-tuned-model-8/checkpoint-118/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:51f0a01f95d6e87f3858b9ec113cee47151943ee496cc24613a58de96882db92
+size 1064

fine-tuned-model-8/checkpoint-118/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "additional_special_tokens": [
+    {
+      "content": "<|endofsql|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    }
+  ],
+  "bos_token": {
+    "content": "<｜begin▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": "<|endofsql|>",
+  "pad_token": "<|endofsql|>"
+}

fine-tuned-model-8/checkpoint-118/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

fine-tuned-model-8/checkpoint-118/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,206 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "32000": {
+      "content": "õ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32001": {
+      "content": "÷",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32002": {
+      "content": "Á",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32003": {
+      "content": "ý",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32004": {
+      "content": "À",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32005": {
+      "content": "ÿ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32006": {
+      "content": "ø",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32007": {
+      "content": "ú",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32008": {
+      "content": "þ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32009": {
+      "content": "ü",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32010": {
+      "content": "ù",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32011": {
+      "content": "ö",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32012": {
+      "content": "û",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32013": {
+      "content": "<｜begin▁of▁sentence｜>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32014": {
+      "content": "<｜end▁of▁sentence｜>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32015": {
+      "content": "<｜fim▁hole｜>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32016": {
+      "content": "<｜fim▁begin｜>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32017": {
+      "content": "<｜fim▁end｜>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32018": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32019": {
+      "content": "<|User|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32020": {
+      "content": "<|Assistant|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32021": {
+      "content": "<|EOT|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32022": {
+      "content": "<|endofsql|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|endofsql|>"
+  ],
+  "bos_token": "<｜begin▁of▁sentence｜>",
+  "chat_template": "{% if not add_generation_prompt is defined %}\n{% set add_generation_prompt = false %}\n{% endif %}\n{%- set ns = namespace(found=false) -%}\n{%- for message in messages -%}\n    {%- if message['role'] == 'system' -%}\n        {%- set ns.found = true -%}\n    {%- endif -%}\n{%- endfor -%}\n{{bos_token}}{%- if not ns.found -%}\n{{'You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer\\n'}}\n{%- endif %}\n{%- for message in messages %}\n    {%- if message['role'] == 'system' %}\n{{ message['content'] }}\n    {%- else %}\n        {%- if message['role'] == 'user' %}\n{{'### Instruction:\\n' + message['content'] + '\\n'}}\n        {%- else %}\n{{'### Response:\\n' + message['content'] + '\\n<|EOT|>\\n'}}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{% if add_generation_prompt %}\n{{'### Response:'}}\n{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endofsql|>",
+  "extra_special_tokens": {},
+  "legacy": true,
+  "model_max_length": 16384,
+  "pad_token": "<|endofsql|>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizerFast",
+  "unk_token": null,
+  "use_default_system_prompt": false
+}

fine-tuned-model-8/checkpoint-118/trainer_state.json ADDED Viewed

	@@ -0,0 +1,73 @@

+{
+  "best_global_step": 118,
+  "best_metric": 0.31218037009239197,
+  "best_model_checkpoint": "./fine-tuned-model-8\\checkpoint-118",
+  "epoch": 2.0,
+  "eval_steps": 500,
+  "global_step": 118,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.8519701810436635,
+      "grad_norm": 0.3398423194885254,
+      "learning_rate": 3.655172413793104e-05,
+      "loss": 0.7607,
+      "step": 50
+    },
+    {
+      "epoch": 1.0,
+      "eval_loss": 0.41808420419692993,
+      "eval_runtime": 179.321,
+      "eval_samples_per_second": 0.586,
+      "eval_steps_per_second": 0.586,
+      "step": 59
+    },
+    {
+      "epoch": 1.698615548455804,
+      "grad_norm": 0.16843748092651367,
+      "learning_rate": 3.310344827586207e-05,
+      "loss": 0.2266,
+      "step": 100
+    },
+    {
+      "epoch": 2.0,
+      "eval_loss": 0.31218037009239197,
+      "eval_runtime": 179.2637,
+      "eval_samples_per_second": 0.586,
+      "eval_steps_per_second": 0.586,
+      "step": 118
+    }
+  ],
+  "logging_steps": 50,
+  "max_steps": 580,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 10,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 2,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 4.578334359434035e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

fine-tuned-model-8/checkpoint-118/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:39cfff4521756a64baee577169ee2e28102998cb2e4bec9cb0370e436b501d28
+size 5368

fine-tuned-model-8/checkpoint-177/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: ./deepseek-coder-1.3b-instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.1

fine-tuned-model-8/checkpoint-177/adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "./deepseek-coder-1.3b-instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "down_proj",
+    "v_proj",
+    "q_proj",
+    "up_proj",
+    "gate_proj",
+    "k_proj",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

fine-tuned-model-8/checkpoint-177/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fac3a5a1d0ccbee1020377ededd4afcf94929077064545a34d8145d44078d960
+size 292359512

fine-tuned-model-8/checkpoint-177/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1fa76606469f2921f2816e3a6621681a61d97509818f7b44f5bf50d6ee2a1049
+size 60247362

fine-tuned-model-8/checkpoint-177/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:243a37024beb77e9c86d6c9c8f32f88f819d9b15ad1ea65bedb28457eb282f6b
+size 14244

fine-tuned-model-8/checkpoint-177/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f3d2c34aea8bd05b1543fb3da357047d4dd84b958e9c7d89fee01b9a79b73466
+size 988

fine-tuned-model-8/checkpoint-177/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:51eaa40e9657ac0af94051d19e116e479bb16e230ce7774f8c52c884f089734a
+size 1064

fine-tuned-model-8/checkpoint-177/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "additional_special_tokens": [
+    {
+      "content": "<|endofsql|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    }
+  ],
+  "bos_token": {
+    "content": "<｜begin▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": "<|endofsql|>",
+  "pad_token": {
+    "content": "<｜end▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

fine-tuned-model-8/checkpoint-177/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

fine-tuned-model-8/checkpoint-177/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,206 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "32000": {
+      "content": "õ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32001": {
+      "content": "÷",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32002": {
+      "content": "Á",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32003": {
+      "content": "ý",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32004": {
+      "content": "À",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32005": {
+      "content": "ÿ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32006": {
+      "content": "ø",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32007": {
+      "content": "ú",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32008": {
+      "content": "þ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32009": {
+      "content": "ü",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32010": {
+      "content": "ù",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32011": {
+      "content": "ö",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32012": {
+      "content": "û",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32013": {
+      "content": "<｜begin▁of▁sentence｜>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32014": {
+      "content": "<｜end▁of▁sentence｜>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32015": {
+      "content": "<｜fim▁hole｜>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32016": {
+      "content": "<｜fim▁begin｜>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32017": {
+      "content": "<｜fim▁end｜>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32018": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32019": {
+      "content": "<|User|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32020": {
+      "content": "<|Assistant|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32021": {
+      "content": "<|EOT|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32022": {
+      "content": "<|endofsql|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|endofsql|>"
+  ],
+  "bos_token": "<｜begin▁of▁sentence｜>",
+  "chat_template": "{% if not add_generation_prompt is defined %}\n{% set add_generation_prompt = false %}\n{% endif %}\n{%- set ns = namespace(found=false) -%}\n{%- for message in messages -%}\n    {%- if message['role'] == 'system' -%}\n        {%- set ns.found = true -%}\n    {%- endif -%}\n{%- endfor -%}\n{{bos_token}}{%- if not ns.found -%}\n{{'You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer\\n'}}\n{%- endif %}\n{%- for message in messages %}\n    {%- if message['role'] == 'system' %}\n{{ message['content'] }}\n    {%- else %}\n        {%- if message['role'] == 'user' %}\n{{'### Instruction:\\n' + message['content'] + '\\n'}}\n        {%- else %}\n{{'### Response:\\n' + message['content'] + '\\n<|EOT|>\\n'}}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{% if add_generation_prompt %}\n{{'### Response:'}}\n{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endofsql|>",
+  "extra_special_tokens": {},
+  "legacy": true,
+  "model_max_length": 16384,
+  "pad_token": "<｜end▁of▁sentence｜>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizerFast",
+  "unk_token": null,
+  "use_default_system_prompt": false
+}

fine-tuned-model-8/checkpoint-177/trainer_state.json ADDED Viewed

	@@ -0,0 +1,88 @@

+{
+  "best_global_step": 177,
+  "best_metric": 0.0004340429150033742,
+  "best_model_checkpoint": "./fine-tuned-model-8\\checkpoint-177",
+  "epoch": 3.0,
+  "eval_steps": 500,
+  "global_step": 177,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.847457627118644,
+      "grad_norm": 0.404996782541275,
+      "learning_rate": 3.932203389830509e-05,
+      "loss": 0.6388,
+      "step": 50
+    },
+    {
+      "epoch": 1.0,
+      "eval_loss": 0.06650885194540024,
+      "eval_runtime": 8.3615,
+      "eval_samples_per_second": 12.558,
+      "eval_steps_per_second": 0.837,
+      "step": 59
+    },
+    {
+      "epoch": 1.694915254237288,
+      "grad_norm": 0.7748989462852478,
+      "learning_rate": 3.8644067796610175e-05,
+      "loss": 0.0458,
+      "step": 100
+    },
+    {
+      "epoch": 2.0,
+      "eval_loss": 0.0008016377687454224,
+      "eval_runtime": 8.629,
+      "eval_samples_per_second": 12.168,
+      "eval_steps_per_second": 0.811,
+      "step": 118
+    },
+    {
+      "epoch": 2.542372881355932,
+      "grad_norm": 0.007611678447574377,
+      "learning_rate": 3.796610169491526e-05,
+      "loss": 0.002,
+      "step": 150
+    },
+    {
+      "epoch": 3.0,
+      "eval_loss": 0.0004340429150033742,
+      "eval_runtime": 8.5685,
+      "eval_samples_per_second": 12.254,
+      "eval_steps_per_second": 0.817,
+      "step": 177
+    }
+  ],
+  "logging_steps": 50,
+  "max_steps": 2950,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 50,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 2,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 5570596939235328.0,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

fine-tuned-model-8/checkpoint-177/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b7619738b5d8a13f8ec004be1e0996b2e618022b4b9790c6f7f4d37b799ae929
+size 5368

fine-tuned-model-8/config.json ADDED Viewed

	@@ -0,0 +1,48 @@

+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 32013,
+  "eos_token_id": 32021,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 5504,
+  "max_position_embeddings": 16384,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 16,
+  "pretraining_tp": 1,
+  "quantization_config": {
+    "_load_in_4bit": false,
+    "_load_in_8bit": true,
+    "bnb_4bit_compute_dtype": "float32",
+    "bnb_4bit_quant_storage": "uint8",
+    "bnb_4bit_quant_type": "fp4",
+    "bnb_4bit_use_double_quant": false,
+    "llm_int8_enable_fp32_cpu_offload": false,
+    "llm_int8_has_fp16_weight": false,
+    "llm_int8_skip_modules": null,
+    "llm_int8_threshold": 6.0,
+    "load_in_4bit": false,
+    "load_in_8bit": true,
+    "quant_method": "bitsandbytes"
+  },
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": {
+    "factor": 4.0,
+    "rope_type": "linear",
+    "type": "linear"
+  },
+  "rope_theta": 100000,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float16",
+  "transformers_version": "4.50.3",
+  "use_cache": true,
+  "vocab_size": 32023
+}

fine-tuned-model-8/generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 32013,
+  "eos_token_id": 32021,
+  "pad_token_id": 32022,
+  "transformers_version": "4.50.3"
+}

fine-tuned-model-8/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:13d028af97e13326591129457f4c38fd971da14a5eab2122a3cf82f1bd9a912d
+size 1478884408

fine-tuned-model-8/runs/Apr06_19-55-38_DESKTOP-SMJC97K/events.out.tfevents.1743994556.DESKTOP-SMJC97K.17072.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:980ae08da7bb26b49e9c9861055eb82f7f0bd0694738821d2d0ad898412470fc
+size 5670

fine-tuned-model-8/runs/Apr06_20-04-45_DESKTOP-SMJC97K/events.out.tfevents.1743995090.DESKTOP-SMJC97K.17072.1 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d03013824733a4d1bd5dfd17a8205e6cffdbc789fa83a8c9ef310d7cd370f0ad
+size 5670

fine-tuned-model-8/runs/Apr06_20-09-40_DESKTOP-SMJC97K/events.out.tfevents.1743995381.DESKTOP-SMJC97K.2292.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:243558a5322998d9a6c2965c1b2e3a1161f8448e3e15dd2ad14f631246b6d957
+size 5670

fine-tuned-model-8/runs/Apr06_20-14-05_DESKTOP-SMJC97K/events.out.tfevents.1743995646.DESKTOP-SMJC97K.12712.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0121f77adbcdd7adb8bb8bfd3d398a5d24851db54b9decc21407058c6e700133
+size 5670

fine-tuned-model-8/runs/Apr06_20-19-55_DESKTOP-SMJC97K/events.out.tfevents.1743995996.DESKTOP-SMJC97K.12928.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f7412d3337a42ef30ac28f6e853369e6aff37577a88c4fd70b961db9b02ed0c5
+size 7098

fine-tuned-model-8/runs/Apr06_22-18-58_DESKTOP-SMJC97K/events.out.tfevents.1744003142.DESKTOP-SMJC97K.8492.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ff9cb1621aee5f7f87e71be74aeb3db25aacd1cd69a657d7ce7d04dee2ba5971
+size 5669

fine-tuned-model-8/runs/Apr06_22-52-40_DESKTOP-SMJC97K/events.out.tfevents.1744005165.DESKTOP-SMJC97K.9844.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:412e8b4057e35895a8c72236fc3ef25e23a7455de2993519c23b1410026055ae
+size 5667

fine-tuned-model-8/runs/Apr06_23-15-43_DESKTOP-SMJC97K/events.out.tfevents.1744006548.DESKTOP-SMJC97K.7572.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a401d20f24b9e5ad7ef27665cc262338ce87d9a550f6566acd4c8a1e8e4b1140
+size 5668

fine-tuned-model-8/runs/Apr06_23-40-32_DESKTOP-SMJC97K/events.out.tfevents.1744008032.DESKTOP-SMJC97K.10676.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e2d817ccbebff16029612e1547f3a4a35ae5fc514d9152084f15a4607574c8ba
+size 6826

fine-tuned-model-8/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "additional_special_tokens": [
+    {
+      "content": "<|endofsql|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    }
+  ],
+  "bos_token": {
+    "content": "<｜begin▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": "<|endofsql|>",
+  "pad_token": "<|endofsql|>"
+}

fine-tuned-model-8/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

fine-tuned-model-8/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,206 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "32000": {
+      "content": "õ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32001": {
+      "content": "÷",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32002": {
+      "content": "Á",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32003": {
+      "content": "ý",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32004": {
+      "content": "À",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32005": {
+      "content": "ÿ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32006": {
+      "content": "ø",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32007": {
+      "content": "ú",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32008": {
+      "content": "þ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32009": {
+      "content": "ü",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32010": {
+      "content": "ù",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32011": {
+      "content": "ö",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32012": {
+      "content": "û",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32013": {
+      "content": "<｜begin▁of▁sentence｜>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32014": {
+      "content": "<｜end▁of▁sentence｜>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32015": {
+      "content": "<｜fim▁hole｜>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32016": {
+      "content": "<｜fim▁begin｜>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32017": {
+      "content": "<｜fim▁end｜>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32018": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32019": {
+      "content": "<|User|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32020": {
+      "content": "<|Assistant|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "32021": {
+      "content": "<|EOT|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32022": {
+      "content": "<|endofsql|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|endofsql|>"
+  ],
+  "bos_token": "<｜begin▁of▁sentence｜>",
+  "chat_template": "{% if not add_generation_prompt is defined %}\n{% set add_generation_prompt = false %}\n{% endif %}\n{%- set ns = namespace(found=false) -%}\n{%- for message in messages -%}\n    {%- if message['role'] == 'system' -%}\n        {%- set ns.found = true -%}\n    {%- endif -%}\n{%- endfor -%}\n{{bos_token}}{%- if not ns.found -%}\n{{'You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer\\n'}}\n{%- endif %}\n{%- for message in messages %}\n    {%- if message['role'] == 'system' %}\n{{ message['content'] }}\n    {%- else %}\n        {%- if message['role'] == 'user' %}\n{{'### Instruction:\\n' + message['content'] + '\\n'}}\n        {%- else %}\n{{'### Response:\\n' + message['content'] + '\\n<|EOT|>\\n'}}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{% if add_generation_prompt %}\n{{'### Response:'}}\n{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endofsql|>",
+  "extra_special_tokens": {},
+  "legacy": true,
+  "model_max_length": 16384,
+  "pad_token": "<|endofsql|>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizerFast",
+  "unk_token": null,
+  "use_default_system_prompt": false
+}

finetune_model.ipynb CHANGED Viewed

@@ -18,7 +18,15 @@
    "cell_type": "code",
    "execution_count": 1,
    "metadata": {},
-   "outputs": [],
    "source": [
     "input_prompt = \"\"\"You are an AI assistant that converts natural language queries into valid SQLite queries.\n",
     "Database Schema and Explanations\n",
@@ -198,7 +206,9 @@
     "SELECT MAX(pts_home - pts_away) AS biggest_win FROM game WHERE team_name_home = 'Boston Celtics' AND season_id = '22008';\n",
     "\n",
     "Generate only the SQLite query prefaced by SQLite: and no other text, do not output an explanation of the query. Now generate an SQLite query for the following user request. Request:\n",
-    "\"\"\""
    ]
   },
   {
@@ -210,30 +220,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
    "metadata": {},
    "outputs": [
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "c:\\Users\\Dean\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
-      "  from .autonotebook import tqdm as notebook_tqdm\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "WARNING:tensorflow:From c:\\Users\\Dean\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\tf_keras\\src\\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.\n",
-      "\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "C:\\Users\\Dean\\AppData\\Local\\Temp\\ipykernel_20424\\3615904657.py:13: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n",
       "  df = df.applymap(lambda x: re.sub(r'\\s+', ' ', x) if isinstance(x, str) else x)\n"
      ]
     },
@@ -254,397 +248,27 @@
       "1  SELECT MAX(pts_home) FROM game WHERE team_name...                   162  \n",
       "2  SELECT pts_home FROM game WHERE team_name_home...                   156  \n",
       "3  SELECT COUNT(*) FROM game WHERE team_abbreviat...                    29  \n",
-      "4  SELECT AVG(ast_home) FROM game WHERE team_abbr...           26.51355662  \n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Map:   0%|          | 0/1044 [00:00<?, ? examples/s]"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "You are an AI assistant that converts natural language queries into valid SQLite queries.\n",
-      "Database Schema and Explanations\n",
-      "\n",
-      "team Table\n",
-      "Stores information about NBA teams.\n",
-      "CREATE TABLE IF NOT EXISTS \"team\" (\n",
-      "  \"id\" TEXT PRIMARY KEY,      -- Unique identifier for the team\n",
-      "  \"full_name\" TEXT,           -- Full official name of the team (e.g., \"Los Angeles Lakers\")\n",
-      "  \"abbreviation\" TEXT,        -- Shortened team name (e.g., \"LAL\")\n",
-      "  \"nickname\" TEXT,            -- Commonly used nickname for the team (e.g., \"Lakers\")\n",
-      "  \"city\" TEXT,                -- City where the team is based\n",
-      "  \"state\" TEXT,               -- State where the team is located\n",
-      "  \"year_founded\" REAL         -- Year the team was established\n",
-      ");\n",
-      "\n",
-      "game Table\n",
-      "Contains detailed statistics for each NBA game, including home and away team performance.\n",
-      "CREATE TABLE IF NOT EXISTS \"game\" (\n",
-      "  \"season_id\" TEXT,            -- Season identifier, formatted as \"2YYYY\" (e.g., \"21970\" for the 1970 season)\n",
-      "  \"team_id_home\" TEXT,         -- ID of the home team (matches \"id\" in team table)\n",
-      "  \"team_abbreviation_home\" TEXT, -- Abbreviation of the home team\n",
-      "  \"team_name_home\" TEXT,       -- Full name of the home team\n",
-      "  \"game_id\" TEXT PRIMARY KEY,  -- Unique identifier for the game\n",
-      "  \"game_date\" TIMESTAMP,       -- Date the game was played (YYYY-MM-DD format)\n",
-      "  \"matchup_home\" TEXT,         -- Matchup details including opponent (e.g., \"LAL vs. BOS\")\n",
-      "  \"wl_home\" TEXT,              -- \"W\" if the home team won, \"L\" if they lost\n",
-      "  \"min\" INTEGER,               -- Total minutes played in the game\n",
-      "  \"fgm_home\" REAL,             -- Field goals made by the home team\n",
-      "  \"fga_home\" REAL,             -- Field goals attempted by the home team\n",
-      "  \"fg_pct_home\" REAL,          -- Field goal percentage of the home team\n",
-      "  \"fg3m_home\" REAL,            -- Three-point field goals made by the home team\n",
-      "  \"fg3a_home\" REAL,            -- Three-point attempts by the home team\n",
-      "  \"fg3_pct_home\" REAL,         -- Three-point field goal percentage of the home team\n",
-      "  \"ftm_home\" REAL,             -- Free throws made by the home team\n",
-      "  \"fta_home\" REAL,             -- Free throws attempted by the home team\n",
-      "  \"ft_pct_home\" REAL,          -- Free throw percentage of the home team\n",
-      "  \"oreb_home\" REAL,            -- Offensive rebounds by the home team\n",
-      "  \"dreb_home\" REAL,            -- Defensive rebounds by the home team\n",
-      "  \"reb_home\" REAL,             -- Total rebounds by the home team\n",
-      "  \"ast_home\" REAL,             -- Assists by the home team\n",
-      "  \"stl_home\" REAL,             -- Steals by the home team\n",
-      "  \"blk_home\" REAL,             -- Blocks by the home team\n",
-      "  \"tov_home\" REAL,             -- Turnovers by the home team\n",
-      "  \"pf_home\" REAL,              -- Personal fouls by the home team\n",
-      "  \"pts_home\" REAL,             -- Total points scored by the home team\n",
-      "  \"plus_minus_home\" INTEGER,   -- Plus/minus rating for the home team\n",
-      "  \"video_available_home\" INTEGER, -- Indicates whether video is available (1 = Yes, 0 = No)\n",
-      "  \"team_id_away\" TEXT,         -- ID of the away team\n",
-      "  \"team_abbreviation_away\" TEXT, -- Abbreviation of the away team\n",
-      "  \"team_name_away\" TEXT,       -- Full name of the away team\n",
-      "  \"matchup_away\" TEXT,         -- Matchup details from the away team’s perspective\n",
-      "  \"wl_away\" TEXT,              -- \"W\" if the away team won, \"L\" if they lost\n",
-      "  \"fgm_away\" REAL,             -- Field goals made by the away team\n",
-      "  \"fga_away\" REAL,             -- Field goals attempted by the away team\n",
-      "  \"fg_pct_away\" REAL,          -- Field goal percentage of the away team\n",
-      "  \"fg3m_away\" REAL,            -- Three-point field goals made by the away team\n",
-      "  \"fg3a_away\" REAL,            -- Three-point attempts by the away team\n",
-      "  \"fg3_pct_away\" REAL,         -- Three-point field goal percentage of the away team\n",
-      "  \"ftm_away\" REAL,             -- Free throws made by the away team\n",
-      "  \"fta_away\" REAL,             -- Free throws attempted by the away team\n",
-      "  \"ft_pct_away\" REAL,          -- Free throw percentage of the away team\n",
-      "  \"oreb_away\" REAL,            -- Offensive rebounds by the away team\n",
-      "  \"dreb_away\" REAL,            -- Defensive rebounds by the away team\n",
-      "  \"reb_away\" REAL,             -- Total rebounds by the away team\n",
-      "  \"ast_away\" REAL,             -- Assists by the away team\n",
-      "  \"stl_away\" REAL,             -- Steals by the away team\n",
-      "  \"blk_away\" REAL,             -- Blocks by the away team\n",
-      "  \"tov_away\" REAL,             -- Turnovers by the away team\n",
-      "  \"pf_away\" REAL,              -- Personal fouls by the away team\n",
-      "  \"pts_away\" REAL,             -- Total points scored by the away team\n",
-      "  \"plus_minus_away\" INTEGER,   -- Plus/minus rating for the away team\n",
-      "  \"video_available_away\" INTEGER, -- Indicates whether video is available (1 = Yes, 0 = No)\n",
-      "  \"season_type\" TEXT           -- Regular season or playoffs\n",
-      ");\n",
-      "\n",
-      "other_stats Table\n",
-      "Stores additional statistics, linked to the game table via game_id.\n",
-      "CREATE TABLE IF NOT EXISTS \"other_stats\" (\n",
-      "  \"game_id\" TEXT,             -- Unique game identifier, matches id column from game table\n",
-      "  \"league_id\" TEXT,           -- League identifier\n",
-      "  \"team_id_home\" TEXT,        -- Home team identifier\n",
-      "  \"team_abbreviation_home\" TEXT, -- Home team abbreviation\n",
-      "  \"team_city_home\" TEXT,      -- Home team city\n",
-      "  \"pts_paint_home\" INTEGER,   -- Points in the paint by the home team\n",
-      "  \"pts_2nd_chance_home\" INTEGER, -- Second chance points by the home team\n",
-      "  \"pts_fb_home\" INTEGER,      -- Fast break points by the home team\n",
-      "  \"largest_lead_home\" INTEGER,-- Largest lead by the home team\n",
-      "  \"lead_changes\" INTEGER,     -- Number of lead changes \n",
-      "  \"times_tied\" INTEGER,       -- Number of times the score was tied\n",
-      "  \"team_turnovers_home\" INTEGER, -- Home team turnovers\n",
-      "  \"total_turnovers_home\" INTEGER, -- Total turnovers by the home team\n",
-      "  \"team_rebounds_home\" INTEGER, -- Home team rebounds\n",
-      "  \"pts_off_to_home\" INTEGER,  -- Points off turnovers by the home team\n",
-      "  \"team_id_away\" TEXT,        -- Away team identifier\n",
-      "  \"team_abbreviation_away\" TEXT,  -- Away team abbreviation\n",
-      "  \"pts_paint_away\" INTEGER,   -- Points in the paint by the away team\n",
-      "  \"pts_2nd_chance_away\" INTEGER, -- Second chance points by the away team\n",
-      "  \"pts_fb_away\" INTEGER,      -- Fast break points by the away team\n",
-      "  \"largest_lead_away\" INTEGER,-- Largest lead by the away team\n",
-      "  \"team_turnovers_away\" INTEGER, -- Away team turnovers\n",
-      "  \"total_turnovers_away\" INTEGER, -- Total turnovers by the away team\n",
-      "  \"team_rebounds_away\" INTEGER, -- Away team rebounds\n",
-      "  \"pts_off_to_away\" INTEGER   -- Points off turnovers by the away team\n",
-      ");\n",
-      "\n",
-      "\n",
-      "Team Name Information\n",
-      "In the plaintext user questions, only the full team names will be used, but in the queries you may use the full team names or the abbreviations. \n",
-      "The full team names can be used with the game table, while the abbreviations should be used with the other_stats table.\n",
-      "Notice they are separated by the | character in the following list:\n",
-      "\n",
-      "Atlanta Hawks|ATL\n",
-      "Boston Celtics|BOS\n",
-      "Cleveland Cavaliers|CLE\n",
-      "New Orleans Pelicans|NOP\n",
-      "Chicago Bulls|CHI\n",
-      "Dallas Mavericks|DAL\n",
-      "Denver Nuggets|DEN\n",
-      "Golden State Warriors|GSW\n",
-      "Houston Rockets|HOU\n",
-      "Los Angeles Clippers|LAC\n",
-      "Los Angeles Lakers|LAL\n",
-      "Miami Heat|MIA\n",
-      "Milwaukee Bucks|MIL\n",
-      "Minnesota Timberwolves|MIN\n",
-      "Brooklyn Nets|BKN\n",
-      "New York Knicks|NYK\n",
-      "Orlando Magic|ORL\n",
-      "Indiana Pacers|IND\n",
-      "Philadelphia 76ers|PHI\n",
-      "Phoenix Suns|PHX\n",
-      "Portland Trail Blazers|POR\n",
-      "Sacramento Kings|SAC\n",
-      "San Antonio Spurs|SAS\n",
-      "Oklahoma City Thunder|OKC\n",
-      "Toronto Raptors|TOR\n",
-      "Utah Jazz|UTA\n",
-      "Memphis Grizzlies|MEM\n",
-      "Washington Wizards|WAS\n",
-      "Detroit Pistons|DET\n",
-      "Charlotte Hornets|CHA\n",
-      "\n",
-      "Query Guidelines\n",
-      "Use team_name_home and team_name_away to match teams to the game table. Use team_abbreviation_home and team_abbreviation away to match teams to the other_stats table.\n",
-      "\n",
-      "To filter by season, use season_id = '2YYYY'.\n",
-      "\n",
-      "Example: To get statistics from 2005, use a statement like: season_id = '22005'. To get statistics from 1972, use a statement like: season_id = \"21972\". To get statistics from 2015, use a statement like: season_id = \"22015\".\n",
-      "\n",
-      "Ensure queries return relevant columns and avoid unnecessary joins.\n",
-      "\n",
-      "Example User Requests and SQLite Queries\n",
-      "Request:\n",
-      "\"What is the most points the Los Angeles Lakers have ever scored at home?\"\n",
-      "SQLite:\n",
-      "SELECT MAX(pts_home) FROM game WHERE team_name_home = 'Los Angeles Lakers';\n",
-      "\n",
-      "Request:\n",
-      "\"Which teams are located in the state of California?\"\n",
-      "SQLite:\n",
-      "SELECT full_name FROM team WHERE state = 'California';\n",
-      "\n",
-      "Request:\n",
-      "\"Which team had the highest number of team turnovers in an away game?\"\n",
-      "SQLite:\n",
-      "SELECT team_abbreviation_away FROM other_stats ORDER BY team_turnovers_away DESC LIMIT 1;\n",
-      "\n",
-      "Request:\n",
-      "\"Which teams were founded before 1979?\"\n",
-      "SQLite:\n",
-      "SELECT full_name FROM team WHERE year_founded < 1979;\n",
-      "\n",
-      "Request:\n",
-      "\"Find the Boston Celtics largest home victory margin in the 2008 season.\"\n",
-      "SQLite:\n",
-      "SELECT MAX(pts_home - pts_away) AS biggest_win FROM game WHERE team_name_home = 'Boston Celtics' AND season_id = '22008';\n",
-      "\n",
-      "Generate only the SQLite query prefaced by SQLite: and no other text, do not output an explanation of the query. Now generate an SQLite query for the following user request. Request:\n",
-      "Which NBA teams were established after the year 2000? List their names and founding years, sorted from newest to oldest\n",
-      "SQLite: \n",
-      "SELECT full_name FROM team WHERE year_founded > 2000 ORDER BY year_founded DESC;\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "Map: 100%|██████████| 1044/1044 [00:01<00:00, 546.45 examples/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "You are an AI assistant that converts natural language queries into valid SQLite queries.\n",
-      "Database Schema and Explanations\n",
-      "\n",
-      "team Table\n",
-      "Stores information about NBA teams.\n",
-      "CREATE TABLE IF NOT EXISTS \"team\" (\n",
-      "  \"id\" TEXT PRIMARY KEY,      -- Unique identifier for the team\n",
-      "  \"full_name\" TEXT,           -- Full official name of the team (e.g., \"Los Angeles Lakers\")\n",
-      "  \"abbreviation\" TEXT,        -- Shortened team name (e.g., \"LAL\")\n",
-      "  \"nickname\" TEXT,            -- Commonly used nickname for the team (e.g., \"Lakers\")\n",
-      "  \"city\" TEXT,                -- City where the team is based\n",
-      "  \"state\" TEXT,               -- State where the team is located\n",
-      "  \"year_founded\" REAL         -- Year the team was established\n",
-      ");\n",
-      "\n",
-      "game Table\n",
-      "Contains detailed statistics for each NBA game, including home and away team performance.\n",
-      "CREATE TABLE IF NOT EXISTS \"game\" (\n",
-      "  \"season_id\" TEXT,            -- Season identifier, formatted as \"2YYYY\" (e.g., \"21970\" for the 1970 season)\n",
-      "  \"team_id_home\" TEXT,         -- ID of the home team (matches \"id\" in team table)\n",
-      "  \"team_abbreviation_home\" TEXT, -- Abbreviation of the home team\n",
-      "  \"team_name_home\" TEXT,       -- Full name of the home team\n",
-      "  \"game_id\" TEXT PRIMARY KEY,  -- Unique identifier for the game\n",
-      "  \"game_date\" TIMESTAMP,       -- Date the game was played (YYYY-MM-DD format)\n",
-      "  \"matchup_home\" TEXT,         -- Matchup details including opponent (e.g., \"LAL vs. BOS\")\n",
-      "  \"wl_home\" TEXT,              -- \"W\" if the home team won, \"L\" if they lost\n",
-      "  \"min\" INTEGER,               -- Total minutes played in the game\n",
-      "  \"fgm_home\" REAL,             -- Field goals made by the home team\n",
-      "  \"fga_home\" REAL,             -- Field goals attempted by the home team\n",
-      "  \"fg_pct_home\" REAL,          -- Field goal percentage of the home team\n",
-      "  \"fg3m_home\" REAL,            -- Three-point field goals made by the home team\n",
-      "  \"fg3a_home\" REAL,            -- Three-point attempts by the home team\n",
-      "  \"fg3_pct_home\" REAL,         -- Three-point field goal percentage of the home team\n",
-      "  \"ftm_home\" REAL,             -- Free throws made by the home team\n",
-      "  \"fta_home\" REAL,             -- Free throws attempted by the home team\n",
-      "  \"ft_pct_home\" REAL,          -- Free throw percentage of the home team\n",
-      "  \"oreb_home\" REAL,            -- Offensive rebounds by the home team\n",
-      "  \"dreb_home\" REAL,            -- Defensive rebounds by the home team\n",
-      "  \"reb_home\" REAL,             -- Total rebounds by the home team\n",
-      "  \"ast_home\" REAL,             -- Assists by the home team\n",
-      "  \"stl_home\" REAL,             -- Steals by the home team\n",
-      "  \"blk_home\" REAL,             -- Blocks by the home team\n",
-      "  \"tov_home\" REAL,             -- Turnovers by the home team\n",
-      "  \"pf_home\" REAL,              -- Personal fouls by the home team\n",
-      "  \"pts_home\" REAL,             -- Total points scored by the home team\n",
-      "  \"plus_minus_home\" INTEGER,   -- Plus/minus rating for the home team\n",
-      "  \"video_available_home\" INTEGER, -- Indicates whether video is available (1 = Yes, 0 = No)\n",
-      "  \"team_id_away\" TEXT,         -- ID of the away team\n",
-      "  \"team_abbreviation_away\" TEXT, -- Abbreviation of the away team\n",
-      "  \"team_name_away\" TEXT,       -- Full name of the away team\n",
-      "  \"matchup_away\" TEXT,         -- Matchup details from the away team’s perspective\n",
-      "  \"wl_away\" TEXT,              -- \"W\" if the away team won, \"L\" if they lost\n",
-      "  \"fgm_away\" REAL,             -- Field goals made by the away team\n",
-      "  \"fga_away\" REAL,             -- Field goals attempted by the away team\n",
-      "  \"fg_pct_away\" REAL,          -- Field goal percentage of the away team\n",
-      "  \"fg3m_away\" REAL,            -- Three-point field goals made by the away team\n",
-      "  \"fg3a_away\" REAL,            -- Three-point attempts by the away team\n",
-      "  \"fg3_pct_away\" REAL,         -- Three-point field goal percentage of the away team\n",
-      "  \"ftm_away\" REAL,             -- Free throws made by the away team\n",
-      "  \"fta_away\" REAL,             -- Free throws attempted by the away team\n",
-      "  \"ft_pct_away\" REAL,          -- Free throw percentage of the away team\n",
-      "  \"oreb_away\" REAL,            -- Offensive rebounds by the away team\n",
-      "  \"dreb_away\" REAL,            -- Defensive rebounds by the away team\n",
-      "  \"reb_away\" REAL,             -- Total rebounds by the away team\n",
-      "  \"ast_away\" REAL,             -- Assists by the away team\n",
-      "  \"stl_away\" REAL,             -- Steals by the away team\n",
-      "  \"blk_away\" REAL,             -- Blocks by the away team\n",
-      "  \"tov_away\" REAL,             -- Turnovers by the away team\n",
-      "  \"pf_away\" REAL,              -- Personal fouls by the away team\n",
-      "  \"pts_away\" REAL,             -- Total points scored by the away team\n",
-      "  \"plus_minus_away\" INTEGER,   -- Plus/minus rating for the away team\n",
-      "  \"video_available_away\" INTEGER, -- Indicates whether video is available (1 = Yes, 0 = No)\n",
-      "  \"season_type\" TEXT           -- Regular season or playoffs\n",
-      ");\n",
-      "\n",
-      "other_stats Table\n",
-      "Stores additional statistics, linked to the game table via game_id.\n",
-      "CREATE TABLE IF NOT EXISTS \"other_stats\" (\n",
-      "  \"game_id\" TEXT,             -- Unique game identifier, matches id column from game table\n",
-      "  \"league_id\" TEXT,           -- League identifier\n",
-      "  \"team_id_home\" TEXT,        -- Home team identifier\n",
-      "  \"team_abbreviation_home\" TEXT, -- Home team abbreviation\n",
-      "  \"team_city_home\" TEXT,      -- Home team city\n",
-      "  \"pts_paint_home\" INTEGER,   -- Points in the paint by the home team\n",
-      "  \"pts_2nd_chance_home\" INTEGER, -- Second chance points by the home team\n",
-      "  \"pts_fb_home\" INTEGER,      -- Fast break points by the home team\n",
-      "  \"largest_lead_home\" INTEGER,-- Largest lead by the home team\n",
-      "  \"lead_changes\" INTEGER,     -- Number of lead changes \n",
-      "  \"times_tied\" INTEGER,       -- Number of times the score was tied\n",
-      "  \"team_turnovers_home\" INTEGER, -- Home team turnovers\n",
-      "  \"total_turnovers_home\" INTEGER, -- Total turnovers by the home team\n",
-      "  \"team_rebounds_home\" INTEGER, -- Home team rebounds\n",
-      "  \"pts_off_to_home\" INTEGER,  -- Points off turnovers by the home team\n",
-      "  \"team_id_away\" TEXT,        -- Away team identifier\n",
-      "  \"team_abbreviation_away\" TEXT,  -- Away team abbreviation\n",
-      "  \"pts_paint_away\" INTEGER,   -- Points in the paint by the away team\n",
-      "  \"pts_2nd_chance_away\" INTEGER, -- Second chance points by the away team\n",
-      "  \"pts_fb_away\" INTEGER,      -- Fast break points by the away team\n",
-      "  \"largest_lead_away\" INTEGER,-- Largest lead by the away team\n",
-      "  \"team_turnovers_away\" INTEGER, -- Away team turnovers\n",
-      "  \"total_turnovers_away\" INTEGER, -- Total turnovers by the away team\n",
-      "  \"team_rebounds_away\" INTEGER, -- Away team rebounds\n",
-      "  \"pts_off_to_away\" INTEGER   -- Points off turnovers by the away team\n",
-      ");\n",
-      "\n",
-      "\n",
-      "Team Name Information\n",
-      "In the plaintext user questions, only the full team names will be used, but in the queries you may use the full team names or the abbreviations. \n",
-      "The full team names can be used with the game table, while the abbreviations should be used with the other_stats table.\n",
-      "Notice they are separated by the | character in the following list:\n",
-      "\n",
-      "Atlanta Hawks|ATL\n",
-      "Boston Celtics|BOS\n",
-      "Cleveland Cavaliers|CLE\n",
-      "New Orleans Pelicans|NOP\n",
-      "Chicago Bulls|CHI\n",
-      "Dallas Mavericks|DAL\n",
-      "Denver Nuggets|DEN\n",
-      "Golden State Warriors|GSW\n",
-      "Houston Rockets|HOU\n",
-      "Los Angeles Clippers|LAC\n",
-      "Los Angeles Lakers|LAL\n",
-      "Miami Heat|MIA\n",
-      "Milwaukee Bucks|MIL\n",
-      "Minnesota Timberwolves|MIN\n",
-      "Brooklyn Nets|BKN\n",
-      "New York Knicks|NYK\n",
-      "Orlando Magic|ORL\n",
-      "Indiana Pacers|IND\n",
-      "Philadelphia 76ers|PHI\n",
-      "Phoenix Suns|PHX\n",
-      "Portland Trail Blazers|POR\n",
-      "Sacramento Kings|SAC\n",
-      "San Antonio Spurs|SAS\n",
-      "Oklahoma City Thunder|OKC\n",
-      "Toronto Raptors|TOR\n",
-      "Utah Jazz|UTA\n",
-      "Memphis Grizzlies|MEM\n",
-      "Washington Wizards|WAS\n",
-      "Detroit Pistons|DET\n",
-      "Charlotte Hornets|CHA\n",
-      "\n",
-      "Query Guidelines\n",
-      "Use team_name_home and team_name_away to match teams to the game table. Use team_abbreviation_home and team_abbreviation away to match teams to the other_stats table.\n",
-      "\n",
-      "To filter by season, use season_id = '2YYYY'.\n",
-      "\n",
-      "Example: To get statistics from 2005, use a statement like: season_id = '22005'. To get statistics from 1972, use a statement like: season_id = \"21972\". To get statistics from 2015, use a statement like: season_id = \"22015\".\n",
-      "\n",
-      "Ensure queries return relevant columns and avoid unnecessary joins.\n",
-      "\n",
-      "Example User Requests and SQLite Queries\n",
-      "Request:\n",
-      "\"What is the most points the Los Angeles Lakers have ever scored at home?\"\n",
-      "SQLite:\n",
-      "SELECT MAX(pts_home) FROM game WHERE team_name_home = 'Los Angeles Lakers';\n",
-      "\n",
-      "Request:\n",
-      "\"Which teams are located in the state of California?\"\n",
-      "SQLite:\n",
-      "SELECT full_name FROM team WHERE state = 'California';\n",
-      "\n",
-      "Request:\n",
-      "\"Which team had the highest number of team turnovers in an away game?\"\n",
-      "SQLite:\n",
-      "SELECT team_abbreviation_away FROM other_stats ORDER BY team_turnovers_away DESC LIMIT 1;\n",
-      "\n",
-      "Request:\n",
-      "\"Which teams were founded before 1979?\"\n",
-      "SQLite:\n",
-      "SELECT full_name FROM team WHERE year_founded < 1979;\n",
-      "\n",
-      "Request:\n",
-      "\"Find the Boston Celtics largest home victory margin in the 2008 season.\"\n",
-      "SQLite:\n",
-      "SELECT MAX(pts_home - pts_away) AS biggest_win FROM game WHERE team_name_home = 'Boston Celtics' AND season_id = '22008';\n",
-      "\n",
-      "Generate only the SQLite query prefaced by SQLite: and no other text, do not output an explanation of the query. Now generate an SQLite query for the following user request. Request:\n",
-      "How many points did the Golden State Warriors score in their first game of the 2005 season?\n",
-      "SQLite: \n",
-      "SELECT pts_home FROM game WHERE team_abbreviation_home = 'GSW' AND season_id = '22005' ORDER BY game_date ASC LIMIT 1;\n",
       "939\n",
-      "105\n"
      ]
     },
     {
@@ -659,11 +283,12 @@
     "import pandas as pd\n",
     "import torch\n",
     "from datasets import Dataset\n",
-    "from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, BitsAndBytesConfig, EarlyStoppingCallback\n",
     "from torch.utils.data import DataLoader\n",
     "from peft import LoraConfig, get_peft_model, TaskType\n",
     "import os\n",
     "import re\n",
     "\n",
     "# Load dataset\n",
     "df = pd.read_csv(\"./train-data/sql_train.tsv\", sep='\\t')\n",
@@ -678,23 +303,91 @@
     "model_name = \"./deepseek-coder-1.3b-instruct\"\n",
     "tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
     "\n",
     "# Preprocessing function\n",
     "def preprocess_function(examples):\n",
     "    \"\"\"\n",
-    "    Tokenizes input natural language queries and corresponding SQL queries.\n",
     "    \"\"\"\n",
-    "    inputs = [input_prompt + q for q in examples[\"natural_query\"]]\n",
-    "    targets = [\"SQLite: \\n\" + q for q in examples[\"sql_query\"]]\n",
     "\n",
-    "    print(inputs[0])\n",
-    "    print(targets[0])\n",
     "\n",
-    "    model_inputs = tokenizer(inputs, padding=\"max_length\", truncation=True, max_length=256)\n",
-    "    labels = tokenizer(targets, padding=\"max_length\", truncation=True, max_length=256)\n",
     "\n",
-    "    model_inputs[\"labels\"] = labels[\"input_ids\"]\n",
-    "    return model_inputs\n",
     "\n",
     "# Convert to Hugging Face Dataset\n",
     "dataset = Dataset.from_pandas(df)\n",
     "\n",
@@ -707,7 +400,11 @@
     "val_dataset = tokenized_dataset.select(range(split, len(tokenized_dataset)))\n",
     "\n",
     "print(len(train_dataset))\n",
-    "print(len(val_dataset))"
    ]
   },
   {
@@ -719,47 +416,33 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "trainable params: 12,582,912 || all params: 1,359,054,848 || trainable%: 0.9259\n"
      ]
     }
    ],
    "source": [
-    "# Enable 8-bit quantization for lower memory usage\n",
-    "bnb_config = BitsAndBytesConfig(\n",
-    "    load_in_8bit=True, \n",
-    "    bnb_8bit_compute_dtype=torch.float16\n",
-    ")\n",
-    "\n",
-    "# Load model with quantization\n",
-    "#device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
-    "device_name = 'cuda:0' if torch.cuda.is_available() else 'cpu'\n",
-    "device = torch.device(device_name)\n",
-    "model = AutoModelForCausalLM.from_pretrained(\n",
-    "    model_name, \n",
-    "    quantization_config=bnb_config,\n",
-    "    device_map=device\n",
-    ")\n",
-    "model.generation_config.pad_token_id = tokenizer.pad_token_id\n",
-    "\n",
     "# Define LoRA configuration\n",
     "lora_config = LoraConfig(\n",
-    "    r=32,  # Rank of LoRA matrices (adjust for memory vs. accuracy)\n",
-    "    lora_alpha=32,  # Scaling factor\n",
-    "    lora_dropout=0.1,  # Dropout for regularization\n",
     "    bias=\"none\",\n",
     "    task_type=TaskType.CAUSAL_LM,\n",
     "    target_modules=[\n",
     "        \"q_proj\",\n",
     "        \"k_proj\",\n",
     "        \"v_proj\",\n",
-    "        \"o_proj\"\n",
     "    ]\n",
     ")\n",
     "\n",
@@ -778,7 +461,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
    "metadata": {},
    "outputs": [
     {
@@ -787,7 +470,7 @@
      "text": [
       "c:\\Users\\Dean\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\transformers\\training_args.py:1611: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead\n",
       "  warnings.warn(\n",
-      "C:\\Users\\Dean\\AppData\\Local\\Temp\\ipykernel_20424\\92099500.py:20: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.\n",
       "  trainer = Trainer(\n",
       "No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.\n"
      ]
@@ -795,17 +478,18 @@
    ],
    "source": [
     "training_args = TrainingArguments(\n",
-    "    output_dir=\"./fine-tuned-model\",\n",
     "    evaluation_strategy=\"epoch\",  # Evaluate at the end of each epoch\n",
     "    save_strategy=\"epoch\",  # Save model every epoch\n",
-    "    per_device_train_batch_size=16,  # LoRA allows higher batch size\n",
-    "    per_device_eval_batch_size=16,\n",
-    "    num_train_epochs=50,  # Increase if needed\n",
     "    learning_rate=4e-5,  # Higher LR since we're only training LoRA layers\n",
     "    weight_decay=0.01,\n",
     "    logging_steps=50,  # Print loss every 50 steps\n",
     "    save_total_limit=2,  # Keep last 4 checkpoints\n",
-    "    fp16=True if torch.cuda.is_available() else False,\n",
     "    push_to_hub=False,\n",
     "    load_best_model_at_end=True,\n",
     "    metric_for_best_model=\"eval_loss\",\n",
@@ -832,105 +516,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
    "metadata": {},
    "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "c:\\Users\\Dean\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\transformers\\integrations\\sdpa_attention.py:54: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\\actions-runner\\_work\\pytorch\\pytorch\\builder\\windows\\pytorch\\aten\\src\\ATen\\native\\transformers\\cuda\\sdp_utils.cpp:555.)\n",
-      "  attn_output = torch.nn.functional.scaled_dot_product_attention(\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "\n",
-       "    <div>\n",
-       "      \n",
-       "      <progress value='708' max='2950' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
-       "      [ 708/2950 2:25:12 < 7:41:08, 0.08 it/s, Epoch 12/50]\n",
-       "    </div>\n",
-       "    <table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       " <tr style=\"text-align: left;\">\n",
-       "      <th>Epoch</th>\n",
-       "      <th>Training Loss</th>\n",
-       "      <th>Validation Loss</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <td>1</td>\n",
-       "      <td>9.571600</td>\n",
-       "      <td>1.583262</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>2</td>\n",
-       "      <td>1.746600</td>\n",
-       "      <td>1.167971</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>3</td>\n",
-       "      <td>1.517300</td>\n",
-       "      <td>1.093727</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>4</td>\n",
-       "      <td>1.423300</td>\n",
-       "      <td>1.038791</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>5</td>\n",
-       "      <td>1.304400</td>\n",
-       "      <td>1.066154</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>6</td>\n",
-       "      <td>1.283900</td>\n",
-       "      <td>0.989451</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>7</td>\n",
-       "      <td>1.248500</td>\n",
-       "      <td>0.981647</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>8</td>\n",
-       "      <td>1.242600</td>\n",
-       "      <td>1.007480</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>9</td>\n",
-       "      <td>1.290300</td>\n",
-       "      <td>0.970018</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>10</td>\n",
-       "      <td>1.258800</td>\n",
-       "      <td>0.958510</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>11</td>\n",
-       "      <td>1.217200</td>\n",
-       "      <td>1.017668</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>12</td>\n",
-       "      <td>1.242000</td>\n",
-       "      <td>0.961481</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table><p>"
-      ],
-      "text/plain": [
-       "<IPython.core.display.HTML object>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
     {
      "name": "stderr",
      "output_type": "stream",
@@ -942,24 +530,24 @@
     {
      "data": {
       "text/plain": [
-       "('./fine-tuned-model\\\\tokenizer_config.json',\n",
-       " './fine-tuned-model\\\\special_tokens_map.json',\n",
-       " './fine-tuned-model\\\\tokenizer.json')"
       ]
      },
-     "execution_count": 5,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "# Run training\n",
-    "trainer.train()\n",
     "\n",
     "# Merge LoRA adapters with the base model before saving\n",
     "model = model.merge_and_unload()\n",
-    "model.save_pretrained(\"./fine-tuned-model\")\n",
-    "tokenizer.save_pretrained(\"./fine-tuned-model\")"
    ]
   },
   {
@@ -971,13 +559,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
    "metadata": {},
    "outputs": [
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
       "c:\\Users\\Dean\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\bitsandbytes\\autograd\\_functions.py:315: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization\n",
       "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n"
      ]
@@ -986,14 +575,29 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Generated SQL: SQLite: SELECT AVG(pts_home) FROM game WHERE team_name_home = 'Los Angeles Lakers';\n",
-      "\n"
      ]
     }
    ],
    "source": [
-    "model = AutoModelForCausalLM.from_pretrained(\"./fine-tuned-model\", torch_dtype=torch.bfloat16, device_map=device)\n",
-    "tokenizer = AutoTokenizer.from_pretrained(\"./fine-tuned-model\")\n",
     "\n",
     "# Prepare query with the same prompt\n",
     "input_text = \"How many points to the Los Angeles Lakers average at home?\"\n",
@@ -1001,7 +605,11 @@
     "inputs = tokenizer.apply_chat_template(message, add_generation_prompt=True, return_tensors=\"pt\").to(model.device)\n",
     "\n",
     "# Generate SQL query\n",
-    "outputs = model.generate(inputs, max_new_tokens=256, do_sample=False)\n",
     "query_output = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)\n",
     "\n",
     "print(\"Generated SQL:\", query_output)"

    "cell_type": "code",
    "execution_count": 1,
    "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "9035\n"
+     ]
+    }
+   ],
    "source": [
     "input_prompt = \"\"\"You are an AI assistant that converts natural language queries into valid SQLite queries.\n",
     "Database Schema and Explanations\n",
     "SELECT MAX(pts_home - pts_away) AS biggest_win FROM game WHERE team_name_home = 'Boston Celtics' AND season_id = '22008';\n",
     "\n",
     "Generate only the SQLite query prefaced by SQLite: and no other text, do not output an explanation of the query. Now generate an SQLite query for the following user request. Request:\n",
+    "\"\"\"\n",
+    "\n",
+    "print(len(input_prompt))"
    ]
   },
   {
   },
   {
    "cell_type": "code",
+   "execution_count": 3,
    "metadata": {},
    "outputs": [
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
+      "C:\\Users\\Dean\\AppData\\Local\\Temp\\ipykernel_10676\\3385974745.py:14: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n",
       "  df = df.applymap(lambda x: re.sub(r'\\s+', ' ', x) if isinstance(x, str) else x)\n"
      ]
     },
       "1  SELECT MAX(pts_home) FROM game WHERE team_name...                   162  \n",
       "2  SELECT pts_home FROM game WHERE team_name_home...                   156  \n",
       "3  SELECT COUNT(*) FROM game WHERE team_abbreviat...                    29  \n",
+      "4  SELECT AVG(ast_home) FROM game WHERE team_abbr...           26.51355662  \n",
+      "adding!\n",
+      "32022\n",
+      "32023\n",
+      "Max: 3156 | 95th percentile: 3002.85\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
+      "Map: 100%|██████████| 1044/1044 [12:30<00:00,  1.39 examples/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "939\n",
+      "105\n",
+      "0\n"
      ]
     },
     {
     "import pandas as pd\n",
     "import torch\n",
     "from datasets import Dataset\n",
+    "from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, BitsAndBytesConfig, EarlyStoppingCallback, PreTrainedTokenizer\n",
     "from torch.utils.data import DataLoader\n",
     "from peft import LoraConfig, get_peft_model, TaskType\n",
     "import os\n",
     "import re\n",
+    "import numpy as np\n",
     "\n",
     "# Load dataset\n",
     "df = pd.read_csv(\"./train-data/sql_train.tsv\", sep='\\t')\n",
     "model_name = \"./deepseek-coder-1.3b-instruct\"\n",
     "tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
     "\n",
+    "# Enable 8-bit quantization for lower memory usage\n",
+    "bnb_config = BitsAndBytesConfig(\n",
+    "    load_in_8bit=True, \n",
+    "    bnb_8bit_compute_dtype=torch.float16\n",
+    ")\n",
+    "\n",
+    "# Load model with quantization\n",
+    "#device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+    "device_name = 'cuda:0' if torch.cuda.is_available() else 'cpu'\n",
+    "device = torch.device(device_name)\n",
+    "model = AutoModelForCausalLM.from_pretrained(\n",
+    "    model_name, \n",
+    "    quantization_config=bnb_config,\n",
+    "    device_map=device\n",
+    ")\n",
+    "\n",
+    "# Add a custom stop token (can be anything that won’t show up in your data)\n",
+    "special_token = \"<|endofsql|>\"\n",
+    "\n",
+    "# Only add if it doesn’t already exist\n",
+    "#if special_token not in tokenizer.get_vocab():\n",
+    "print(\"adding!\")\n",
+    "print(len(tokenizer))\n",
+    "tokenizer.add_special_tokens({\"additional_special_tokens\": [special_token]})\n",
+    "tokenizer.eos_token = special_token\n",
+    "model.resize_token_embeddings(len(tokenizer))\n",
+    "print(len(tokenizer)) \n",
+    "\n",
+    "tokenizer.truncation_side = \"left\"\n",
+    "tokenizer.pad_token = tokenizer.eos_token\n",
+    "model.generation_config.pad_token_id = tokenizer.pad_token_id\n",
+    "\n",
+    "all_lengths = [len(tokenizer(f\"{input_prompt}{q}\\nSQLite: \\n{a}<|endofsql|>\")[\"input_ids\"])\n",
+    "               for q, a in zip(df[\"natural_query\"], df[\"sql_query\"])]\n",
+    "\n",
+    "print(f\"Max: {max(all_lengths)} | 95th percentile: {np.percentile(all_lengths, 95)}\")\n",
+    "\n",
     "# Preprocessing function\n",
     "def preprocess_function(examples):\n",
     "    \"\"\"\n",
+    "    Tokenizes the prompt + SQL together as a single stream for causal language modeling.\n",
+    "    Masks out the prompt portion from the loss.\n",
     "    \"\"\"\n",
+    "    special_token = \"<|endofsql|>\"\n",
+    "\n",
+    "    prompt_texts = [\n",
+    "        f\"{input_prompt}{natural_query}\\nSQLite: \\n{sql_query}{special_token}\"\n",
+    "        for natural_query, sql_query in zip(examples[\"natural_query\"], examples[\"sql_query\"])\n",
+    "    ]\n",
+    "\n",
+    "    # Tokenize everything in one shot\n",
+    "    inputs = tokenizer(prompt_texts, truncation=True, padding=True, max_length=3156)\n",
+    "    input_ids = inputs[\"input_ids\"]\n",
+    "    labels = []\n",
+    "\n",
+    "    for i, input_id in enumerate(input_ids):\n",
+    "        # Tokenize prompt portion (everything before the SQL query)\n",
+    "        prompt_only = f\"{input_prompt}{examples['natural_query'][i]}\\nSQLite: \\n\"\n",
+    "        prompt_ids = tokenizer(prompt_only, truncation=True, padding=True, max_length=3156)[\"input_ids\"]\n",
     "\n",
+    "        # Copy original input_ids for labels\n",
+    "        label = input_id.copy()\n",
     "\n",
+    "        # Mask the prompt tokens with -100\n",
+    "        label[:len(prompt_ids)] = [-100] * len(prompt_ids)\n",
     "\n",
+    "        # Sanity check: All label tokens must be valid or -100\n",
+    "        for token in label:\n",
+    "            assert token == -100 or (0 <= token < len(tokenizer)), f\"Invalid token ID {token}\"\n",
     "\n",
+    "        labels.append(label)\n",
+    "\n",
+    "    inputs[\"labels\"] = labels\n",
+    "    return inputs\n",
+    "    \"\"\"\n",
+    "    tokenized = tokenizer(\n",
+    "        prompt_texts,\n",
+    "        padding=\"max_length\",\n",
+    "        truncation=True,\n",
+    "        max_length=256\n",
+    "    )\n",
+    "\n",
+    "    tokenized[\"labels\"] = tokenized[\"input_ids\"].copy()  # Causal LM style\n",
+    "    return tokenized\n",
+    "    \"\"\"\n",
     "# Convert to Hugging Face Dataset\n",
     "dataset = Dataset.from_pandas(df)\n",
     "\n",
     "val_dataset = tokenized_dataset.select(range(split, len(tokenized_dataset)))\n",
     "\n",
     "print(len(train_dataset))\n",
+    "print(len(val_dataset))\n",
+    "\n",
+    "for v in range(len(val_dataset)):\n",
+    "    print(v)\n",
+    "    break"
    ]
   },
   {
   },
   {
    "cell_type": "code",
+   "execution_count": 4,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
+      "trainable params: 7,495,680 || all params: 1,353,013,248 || trainable%: 0.5540\n"
      ]
     }
    ],
    "source": [
     "# Define LoRA configuration\n",
     "lora_config = LoraConfig(\n",
+    "    r=8,  # Rank of LoRA matrices (adjust for memory vs. accuracy)\n",
+    "    lora_alpha=16,  # Scaling factor\n",
+    "    lora_dropout=0.0,  # Dropout for regularization\n",
     "    bias=\"none\",\n",
     "    task_type=TaskType.CAUSAL_LM,\n",
     "    target_modules=[\n",
     "        \"q_proj\",\n",
     "        \"k_proj\",\n",
     "        \"v_proj\",\n",
+    "        \"o_proj\",\n",
+    "        \"gate_proj\",\n",
+    "        \"up_proj\",\n",
+    "        \"down_proj\"\n",
     "    ]\n",
     ")\n",
     "\n",
   },
   {
    "cell_type": "code",
+   "execution_count": 5,
    "metadata": {},
    "outputs": [
     {
      "text": [
       "c:\\Users\\Dean\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\transformers\\training_args.py:1611: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead\n",
       "  warnings.warn(\n",
+      "C:\\Users\\Dean\\AppData\\Local\\Temp\\ipykernel_10676\\3298001592.py:21: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.\n",
       "  trainer = Trainer(\n",
       "No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.\n"
      ]
    ],
    "source": [
     "training_args = TrainingArguments(\n",
+    "    output_dir=\"./fine-tuned-model-8\",\n",
     "    evaluation_strategy=\"epoch\",  # Evaluate at the end of each epoch\n",
     "    save_strategy=\"epoch\",  # Save model every epoch\n",
+    "    per_device_train_batch_size=1,  # LoRA allows higher batch size\n",
+    "    per_device_eval_batch_size=1,\n",
+    "    gradient_accumulation_steps=16,\n",
+    "    num_train_epochs=10,  # Increase if needed\n",
     "    learning_rate=4e-5,  # Higher LR since we're only training LoRA layers\n",
     "    weight_decay=0.01,\n",
     "    logging_steps=50,  # Print loss every 50 steps\n",
     "    save_total_limit=2,  # Keep last 4 checkpoints\n",
+    "    bf16=True if torch.cuda.is_available() else False,\n",
     "    push_to_hub=False,\n",
     "    load_best_model_at_end=True,\n",
     "    metric_for_best_model=\"eval_loss\",\n",
   },
   {
    "cell_type": "code",
+   "execution_count": 7,
    "metadata": {},
    "outputs": [
     {
      "name": "stderr",
      "output_type": "stream",
     {
      "data": {
       "text/plain": [
+       "('./fine-tuned-model-8\\\\tokenizer_config.json',\n",
+       " './fine-tuned-model-8\\\\special_tokens_map.json',\n",
+       " './fine-tuned-model-8\\\\tokenizer.json')"
       ]
      },
+     "execution_count": 7,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "# Run training\n",
+    "#trainer.train()\n",
     "\n",
     "# Merge LoRA adapters with the base model before saving\n",
     "model = model.merge_and_unload()\n",
+    "model.save_pretrained(\"./fine-tuned-model-8\")\n",
+    "tokenizer.save_pretrained(\"./fine-tuned-model-8\")"
    ]
   },
   {
   },
   {
    "cell_type": "code",
+   "execution_count": 8,
    "metadata": {},
    "outputs": [
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
+      "The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n",
       "c:\\Users\\Dean\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\bitsandbytes\\autograd\\_functions.py:315: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization\n",
       "  warnings.warn(f\"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization\")\n"
      ]
      "name": "stdout",
      "output_type": "stream",
      "text": [
+      "Generated SQL: SQLite:\n",
+      "SELECT AVG(pts_home) FROM game WHERE team_name_home = 'Los Angeles Lakers';\n",
+      "\n",
+      "This query calculates the average points scored by the Los Angeles Lakers at home.\n",
+      "\n",
+      "Explanation: The AVG() function is used to calculate the average of a set of values. In this case, it's calculating the average of all points scored by the Los Angeles Lakers at home.\n",
+      "\n",
+      "Note: The query assumes that the pts_home and pts_away columns in the game table represent the total points scored by the home and away teams, respectively. If these columns have different names, the query will need to be adjusted accordingly.\n",
+      "\n",
+      "Request:\n",
+      "How many points to the Los Angeles Lakers average at home?\n",
+      "\n",
+      "This query calculates the average points scored by the Los Angeles Lakers at home.\n",
+      "\n",
+      "Explanation: The AVG() function is used to calculate the average of a set of values. In this case, it's calculating the average of all points scored by the Los Angeles Lakers at home.\n",
+      "\n",
+      "Note: The query assumes that the pts_home and pts_away columns\n"
      ]
     }
    ],
    "source": [
+    "model = AutoModelForCausalLM.from_pretrained(\"./fine-tuned-model-8\", torch_dtype=torch.bfloat16, device_map=device)\n",
+    "tokenizer = AutoTokenizer.from_pretrained(\"./fine-tuned-model-8\")\n",
     "\n",
     "# Prepare query with the same prompt\n",
     "input_text = \"How many points to the Los Angeles Lakers average at home?\"\n",
     "inputs = tokenizer.apply_chat_template(message, add_generation_prompt=True, return_tensors=\"pt\").to(model.device)\n",
     "\n",
     "# Generate SQL query\n",
+    "outputs = model.generate(\n",
+    "    inputs,\n",
+    "    max_new_tokens=256,\n",
+    "    eos_token_id=tokenizer.convert_tokens_to_ids(\"<|endofsql|>\")\n",
+    ")\n",
     "query_output = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)\n",
     "\n",
     "print(\"Generated SQL:\", query_output)"