baseten
/

example-Meta-Llama-3-70B-InstructForSequenceClassification

Model card Files Files and versions Community

michaelfeil commited on Feb 2

Commit

0e24490

·

verified ·

1 Parent(s): 6b53f65

Update README.md

Files changed (1) hide show

README.md +41 -2

README.md CHANGED Viewed

@@ -3,7 +3,7 @@
 ---
 Deployment:
-```
 build_commands: []
 external_package_dirs: []
 model_metadata: {}
@@ -31,10 +31,50 @@ trt_llm:
       source: HF
       repo: "baseten/example-Meta-Llama-3-70B-InstructForSequenceClassification"
       revision: "main" # hf revision hash
     quantization_type: fp8
     num_builder_gpus: 4
 ```
 Reproduce this model:
 ```python
 #!/usr/bin/env python
@@ -101,5 +141,4 @@ def main():
 if __name__ == "__main__":
     main()
 ```

 ---
 Deployment:
+```yaml
 build_commands: []
 external_package_dirs: []
 model_metadata: {}
       source: HF
       repo: "baseten/example-Meta-Llama-3-70B-InstructForSequenceClassification"
       revision: "main" # hf revision hash
+    # `fp8` or `no_quant` (=fp16) are allowed.
     quantization_type: fp8
     num_builder_gpus: 4
 ```
+Usage:
+```python
+import requests
+import os
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("Skywork/Skywork-Reward-Llama-3.1-8B-v0.2")
+prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
+# Positive example, gets high score 0.999 or raw around inv_sig(0.999) ~ 13
+response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
+# negative example, gets low score ~0.001 or raw around inv_sig(0.001) ~ -9
+response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
+# predict api: {
+#   "inputs": "What is Deep Learning?", # str, may be formatted with chat template.
+#   "raw_scores": false, # with or without sigmoid activation
+#   "truncate": false,
+#   "truncation_direction": "right"
+# }
+for assistant_response in [response1, response2]:
+    # Feel free to parallelize this, requests will be batched in the backend.
+    conv = [{"role": "user", "content": prompt}, {"role": "assistant", "content": assistant_response}]
+    conv_formatted = tokenizer.apply_chat_template(conv, tokenize=False)
+    input_json = dict(inputs=conv_formatted, raw_scores=True)
+    resp = requests.post(
+        "https://model-xxxxxx.api.baseten.co/environments/production/sync/predict",
+        headers={"Authorization": f"Api-Key {os.environ['BASETEN_API_KEY']}"},
+        json=input_json,
+    )
+    print(resp.json())
+    # prints
+    # [{'score': 13.714337, 'label': 'LABEL_0'}]
+    # [{'score': -9.353895, 'label': 'LABEL_0'}]
+```
 Reproduce this model:
 ```python
 #!/usr/bin/env python
 if __name__ == "__main__":
     main()
 ```