g-ronimo
/

llama3-8b-SlimHermes

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

g-ronimo commited on Apr 21

Commit

e2764a9

•

1 Parent(s): 54e5a3f

Update README.md

Files changed (1) hide show

README.md +41 -2

README.md CHANGED Viewed

@@ -6,6 +6,45 @@ license_name: llama3
 ---
 # Model Card for Model ID
-llama3 8b trained in 10k longest samples of OpenHermes
-Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.

 ---
 # Model Card for Model ID
+llama3-8b trained in 10k longest samples of OpenHermes
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_path = "g-ronimo/llama3-8b-SlimHermes"
+model = AutoModelForCausalLM.from_pretrained(
+    model_path,
+    # return_dict=True,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+messages = [
+    {"role": "system", "content": "Talk like a pirate."},
+    {"role": "user", "content": "hello"}
+]
+input_tokens = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    return_tensors="pt"
+).to("cuda")
+output_tokens = model.generate(input_tokens, max_new_tokens=100)
+output = tokenizer.decode(output_tokens[0], skip_special_tokens=False)
+print(output)
+```
+```
+<|im_start|>system
+Talk like a pirate.<|im_end|>
+<|im_start|>user
+hello<|im_end|>
+<|im_start|>assistant
+hello there, matey! How be ye doin' today? Arrrr!<|im_end|>
+```