vilm
/

VinaLlama2-14B

Text Generation

text-generation-inference

Model card Files Files and versions Community

qnguyen3 commited on May 6, 2024

Commit

cc79676

·

verified ·

1 Parent(s): 7d66155

Upload README (2).md

Files changed (1) hide show

README (2).md +65 -0

README (2).md ADDED Viewed

	@@ -0,0 +1,65 @@

+---
+license: mit
+language:
+- vi
+---
+# VinaLlama2-14B Beta
+GGUF Here: [VinaLlama2-14B-GGUF](https://huggingface.co/qnguyen3/14b-gguf)
+**Top Features**:
+- **Context Length**: 32,768 tokens.
+- **VERY GOOD** at reasoning, mathematics and creative writing.
+- Works with **Langchain Agent** out-of-the-box.
+**Known Issues**
+- Still a bit struggling with Vietnamese fact (Hoang Sa & Truong Sa, Historical questions).
+- Hallucination when reasoning.
+- Can't do Vi-En/En-Vi translation (yet)!
+Quick use:
+VRAM Requirement: ~20GB
+```bash
+pip install transformers accelerate
+```
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+device = "cuda" # the device to load the model onto
+model = AutoModelForCausalLM.from_pretrained(
+    "vilm/VinaLlama2-14B",
+    torch_dtype='auto',
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained("vilm/VinaLlama2-14B")
+prompt = "Một cộng một bằng mấy?"
+messages = [
+    {"role": "system", "content": "Bạn là trợ lí AI hữu ích."},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(device)
+generated_ids = model.generate(
+    model_inputs.input_ids,
+    max_new_tokens=1024,
+    eos_token_id=tokenizer.eos_token_id,
+    temperature=0.25,
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids)[0]
+print(response)
+```