|
--- |
|
license: mit |
|
language: |
|
- vi |
|
--- |
|
# VinaLlama2-14B Beta |
|
|
|
GGUF Here: [VinaLlama2-14B-GGUF](https://huggingface.co/qnguyen3/14b-gguf) |
|
|
|
**Top Features**: |
|
|
|
- **Context Length**: 32,768 tokens. |
|
- **VERY GOOD** at reasoning, mathematics and creative writing. |
|
- Works with **Langchain Agent** out-of-the-box. |
|
|
|
**Known Issues** |
|
- Still a bit struggling with Vietnamese fact (Hoang Sa & Truong Sa, Historical questions). |
|
- Hallucination when reasoning. |
|
- Can't do Vi-En/En-Vi translation (yet)! |
|
|
|
Quick use: |
|
|
|
VRAM Requirement: ~20GB |
|
|
|
```bash |
|
pip install transformers accelerate |
|
``` |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
device = "cuda" # the device to load the model onto |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
"vilm/VinaLlama2-14B", |
|
torch_dtype='auto', |
|
device_map="auto" |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained("vilm/VinaLlama2-14B") |
|
|
|
prompt = "Một cộng một bằng mấy?" |
|
messages = [ |
|
{"role": "system", "content": "Bạn là trợ lí AI hữu ích."}, |
|
{"role": "user", "content": prompt} |
|
] |
|
text = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
model_inputs = tokenizer([text], return_tensors="pt").to(device) |
|
|
|
generated_ids = model.generate( |
|
model_inputs.input_ids, |
|
max_new_tokens=1024, |
|
eos_token_id=tokenizer.eos_token_id, |
|
temperature=0.25, |
|
) |
|
generated_ids = [ |
|
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
|
] |
|
|
|
response = tokenizer.batch_decode(generated_ids)[0] |
|
print(response) |
|
``` |