|
--- |
|
language: |
|
- vi |
|
library_name: transformers |
|
tags: |
|
- LLMs |
|
- NLP |
|
- Vietnamese |
|
license: mit |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
Chatbots can be programmed with a large knowledge base on answer users' questions on a variety of topics. They can provide facts, data, explanations, definitions, etc. |
|
Complete tasks. Chatbots can be integrated with other systems and APIs to actually do things for users. Based on a user's preferences and past interactions, chatbots can suggest products, services, content and more that might be relevant and useful to the user. |
|
Provide customer service. Chatbots can handle many simple customer service interactions to answer questions, handle complaints, process returns, etc. This allows human agents to focus on more complex issues. |
|
Generate conversational responses - Using NLP and machine learning, chatbots can understand natural language and generate conversational responses, creating fluent interactions. |
|
|
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
- **Model type:** Mistral |
|
- **Language(s) (NLP):** Vietnamese |
|
- **Finetuned from model :** [Viet-Mistral/Vistral-7B-Chat](https://huggingface.co/Viet-Mistral/Vistral-7B-Chat) |
|
|
|
### Purpose |
|
This model is a improve from the old one. It's have the new the tokenizer_config.json to use <|im_start|> and <|im_end|> as the additional special tokens. |
|
|
|
### Training Data |
|
|
|
Our dataset was make base on our university sudent notebook. It includes majors, university regulations and other information about our university. |
|
[hcmue_qa](https://huggingface.co/datasets/Tamnemtf/hcmue_qa) |
|
|
|
### Training Procedure |
|
|
|
```python |
|
# Load LoRA configuration |
|
peft_config = LoraConfig( |
|
r=8, |
|
lora_alpha=16, |
|
target_modules=[ |
|
"q_proj", |
|
"k_proj", |
|
"v_proj", |
|
"o_proj", |
|
"gate_proj", |
|
"up_proj", |
|
"down_proj", |
|
"lm_head", |
|
], |
|
bias="none", |
|
lora_dropout=0.05, # Conventional |
|
task_type="CAUSAL_LM", |
|
) |
|
|
|
#update newchat template |
|
"chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}", |
|
"clean_up_tokenization_spaces": false, |
|
"eos_token": "<|im_end|>", |
|
"legacy": true, |
|
"model_max_length": 1000000000000000019884624838656, |
|
"pad_token": "<unk>", |
|
"sp_model_kwargs": {}, |
|
"spaces_between_special_tokens": false, |
|
"tokenizer_class": "LlamaTokenizer", |
|
"unk_token": "<unk>", |
|
"use_default_system_prompt": false, |
|
"use_fast": true |
|
} |
|
``` |
|
|
|
## Contact |
|
|
|
[email protected] |