macadeliccc
/

Samantha-Qwen-2-7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

macadeliccc commited on Jun 17, 2024

Commit

4aa7c7d

·

verified ·

1 Parent(s): b75442f

Update README.md

Files changed (1) hide show

README.md +32 -0

README.md CHANGED Viewed

@@ -13,6 +13,38 @@ Trained on 2x4090 using QLoRa and FSDP
 + [LoRa](macadeliccc/Samantha-Qwen2-7B-LoRa)
 ## Prompt Template
 ```

 + [LoRa](macadeliccc/Samantha-Qwen2-7B-LoRa)
+## Launch Using VLLM
+```bash
+python -m vllm.entrypoints.openai.api_server \
+    --model macadeliccc/Samantha-Qwen2-7B-AWQ \
+    --chat-template ./examples/template_chatml.jinja \
+    --quantization awq
+```
+```python
+from openai import OpenAI
+# Set OpenAI's API key and API base to use vLLM's API server.
+openai_api_key = "EMPTY"
+openai_api_base = "http://localhost:8000/v1"
+client = OpenAI(
+    api_key=openai_api_key,
+    base_url=openai_api_base,
+)
+chat_response = client.chat.completions.create(
+    model="macadeliccc/Samantha-Qwen2-7B-AWQ",
+    messages=[
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Tell me a joke."},
+    ]
+)
+print("Chat response:", chat_response)
+```
+## Ollama
 ## Prompt Template
 ```