--- license: apache-2.0 language: - th - en metrics: - accuracy datasets: - AIAT/The_Scamper-train pipeline_tag: table-question-answering --- # Model Card for Model ID ## Model Details ### Model Description - **Developed by:** The Scamper - **Model type:** Transformer - **Language(s) (NLP):** Thai, English - **License:** apache-2.0 - **Finetuned from model:** OpenThaiGPT-1.0.0 70B (https://huggingface.co/openthaigpt/openthaigpt-1.0.0-70b-chat) ## Uses The Tubular Question Answering Large Language Model is based on OpenThaiGPT and fine-tuned for converting natural language questions into SQL queries. It learns to map the nuances of Thai language to SQL structures, enabling efficient retrieval of information from databases. model2_path ="AIAT/The_Scamper-opt70bqt" tokenizer = AutoTokenizer.from_pretrained(model2_path, padding_side="right",use_fast=False) model = AutoModelForCausalLM.from_pretrained(model2_path, device_map="auto") ### Recommendations ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure The methodology for fine-tuning involves a dataset with two columns: "question" and "SQL syntax". Here's a brief outline of the process: 1. **Data Collection**: Gather a dataset containing pairs of questions and their corresponding SQL queries. Ensure the questions cover various topics and query types, while the SQL queries represent the desired actions on a database. 2. **Pre-processing**: Clean and preprocess the data to remove noise, standardize formatting, and handle any inconsistencies. Tokenize the text and encode it into a format suitable for training. 3. **Model Architecture**: Utilize OpenThaiGPT 1.0.0 70B as the base model. 4. **Fine-tuning Setup**: Divide the dataset into training (90%) and test sets (10%). We define the training procedure, including hyperparameters such as learning rate, batch size, and number of training epochs. 5. **Fine-tuning Process**: Train the model on the question-SQL pairs using the defined setup. During training, the model learns to predict the SQL query corresponding to a given question by minimizing a suitable loss function. 6. **Testing**: Evaluate the final model on a held-out test set to assess its generalization performance on unseen data. 7. **Deployment**: Deploy the fine-tuned model for text-to-SQL tasks in real-world applications, where it can generate SQL queries from natural language questions effectively and efficiently. By following this methodology, the model can be fine-tuned to accurately convert natural language questions into SQL syntax, enabling seamless interaction with structured databases.