|
--- |
|
language: |
|
- vi |
|
license: afl-3.0 |
|
library_name: transformers |
|
tags: |
|
- NLP |
|
- Vietnamese |
|
base_model: Viet-Mistral/Vistral-7B-Chat |
|
datasets: |
|
- Tamnemtf/hcmue_qa |
|
pipeline_tag: question-answering |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
Chatbots can be programmed with a large knowledge base on answer users' questions on a variety of topics. They can provide facts, data, explanations, definitions, etc. |
|
Complete tasks. Chatbots can be integrated with other systems and APIs to actually do things for users. Based on a user's preferences and past interactions, chatbots can suggest products, services, content and more that might be relevant and useful to the user. |
|
Provide customer service. Chatbots can handle many simple customer service interactions to answer questions, handle complaints, process returns, etc. This allows human agents to focus on more complex issues. |
|
Generate conversational responses - Using NLP and machine learning, chatbots can understand natural language and generate conversational responses, creating fluent interactions. |
|
|
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
- **Model type:** Mistral |
|
- **Language(s) (NLP):** Vietnamese |
|
- **Finetuned from model :** [Viet-Mistral/Vistral-7B-Chat](https://huggingface.co/Viet-Mistral/Vistral-7B-Chat) |
|
|
|
### Purpose |
|
This model is fine-tuned in oder to serve our scientific research topics which is created a chatbot to serve student to know about univesity information. This chatbot is a virtual assistant for students to help answer questions and resolve students' concerns. |
|
|
|
### Training Data |
|
|
|
Our dataset was make base on our university sudent notebook. It includes majors, university regulations and other information about our university. |
|
[hcmue_qa](https://huggingface.co/datasets/Tamnemtf/hcmue_qa) |
|
|
|
### Training Procedure |
|
|
|
```python |
|
# LoRA attention dimension |
|
lora_r = 64 |
|
|
|
# Alpha parameter for LoRA scaling |
|
lora_alpha = 16 |
|
|
|
# Dropout probability for LoRA layers |
|
lora_dropout = 0.1 |
|
|
|
################################################################################ |
|
# bitsandbytes parameters |
|
################################################################################ |
|
|
|
# Activate 4-bit precision base model loading |
|
use_4bit = True |
|
|
|
# Compute dtype for 4-bit base models |
|
bnb_4bit_compute_dtype = "float16" |
|
|
|
# Quantization type (fp4 or nf4) |
|
bnb_4bit_quant_type = "nf4" |
|
|
|
# Activate nested quantization for 4-bit base models (double quantization) |
|
use_nested_quant = False |
|
|
|
################################################################################ |
|
# TrainingArguments parameters |
|
################################################################################ |
|
|
|
# Output directory where the model predictions and checkpoints will be stored |
|
output_dir = "./results" |
|
|
|
# Number of training epochs |
|
num_train_epochs = 1 |
|
|
|
# Enable fp16/bf16 training (set bf16 to True with an A100) |
|
fp16 = False |
|
bf16 = True |
|
|
|
# Batch size per GPU for training |
|
per_device_train_batch_size = 2 |
|
|
|
# Batch size per GPU for evaluation |
|
per_device_eval_batch_size = 2 |
|
|
|
# Number of update steps to accumulate the gradients for |
|
gradient_accumulation_steps = 1 |
|
|
|
# Enable gradient checkpointing |
|
gradient_checkpointing = True |
|
|
|
# Maximum gradient normal (gradient clipping) |
|
max_grad_norm = 0.3 |
|
|
|
# Initial learning rate (AdamW optimizer) |
|
learning_rate = 2e-4 |
|
|
|
# Weight decay to apply to all layers except bias/LayerNorm weights |
|
weight_decay = 0.001 |
|
|
|
# Optimizer to use |
|
optim = "paged_adamw_32bit" |
|
|
|
# Learning rate schedule (constant a bit better than cosine) |
|
lr_scheduler_type = "constant" |
|
|
|
# Number of training steps (overrides num_train_epochs) |
|
max_steps = -1 |
|
|
|
# Ratio of steps for a linear warmup (from 0 to learning rate) |
|
warmup_ratio = 0.03 |
|
|
|
# Group sequences into batches with same length |
|
# Saves memory and speeds up training considerably |
|
group_by_length = True |
|
|
|
# Save checkpoint every X updates steps |
|
save_steps = 25 |
|
|
|
# Log every X updates steps |
|
logging_steps = 25 |
|
|
|
################################################################################ |
|
# SFT parameters |
|
################################################################################ |
|
|
|
# Maximum sequence length to use |
|
max_seq_length = None |
|
|
|
# Pack multiple short examples in the same input sequence to increase efficiency |
|
packing = False |
|
|
|
# Load the entire model on the GPU 0 |
|
device_map = {"": 0} |
|
``` |
|
|
|
## Contact |
|
|
|
[email protected] |