pankajmathur
/

orca_mini_v9_2_70b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

pankajmathur commited on 16 days ago

Commit

a0c83d9

·

verified ·

1 Parent(s): 19f0211

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -38,7 +38,7 @@ Hello Orca Mini, what can you do for me?<|eot_id|>
 <|start_header_id|>assistant<|end_header_id|>
 ```
-Below shows a code example on how to use this model in default half precision (bfloat16) format
 ```python
 import torch
@@ -58,7 +58,7 @@ outputs = pipeline(messages, max_new_tokens=128, do_sample=True, temperature=0.0
 print(outputs[0]["generated_text"][-1])
 ```
-Below shows a code example on how to use this model in 4-bit format via bitsandbytes library
 ```python
 import torch
@@ -86,7 +86,7 @@ print(outputs[0]["generated_text"][-1])
 ```
-Below shows a code example on how to use this model in 8-bit format via bitsandbytes library
 ```python
 import torch

 <|start_header_id|>assistant<|end_header_id|>
 ```
+Below shows a code example on how to use this model in default half precision (bfloat16) format, it requires around ~133GB VRAM
 ```python
 import torch
 print(outputs[0]["generated_text"][-1])
 ```
+Below shows a code example on how to use this model in 4-bit format via bitsandbytes library, it requires around ~39GB VRAM
 ```python
 import torch
 ```
+Below shows a code example on how to use this model in 8-bit format via bitsandbytes library, it requires around ~69GB VRAM
 ```python
 import torch