Update README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ Instruction Pre-Training:
|
|
19 |
40B tokens of instruction data, with one-third focused on reasoning tasks.
|
20 |
|
21 |
Supervised Fine-Tuning (SFT):
|
22 |
-
~500K high-quality and diverse instructions with balanced complexity. Reasoning tasks make up
|
23 |
|
24 |
Preference Tuning:
|
25 |
~100K carefully selected instructions, filtered by length and type for general tasks and with domain-balanced selection for reasoning tasks.
|
@@ -240,4 +240,48 @@ outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampli
|
|
240 |
|
241 |
generated_text = [output.outputs[0].text for output in outputs]
|
242 |
print(generated_text)
|
243 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
40B tokens of instruction data, with one-third focused on reasoning tasks.
|
20 |
|
21 |
Supervised Fine-Tuning (SFT):
|
22 |
+
~500K high-quality and diverse instructions with balanced complexity. Reasoning tasks make up about 20% of the dataset.
|
23 |
|
24 |
Preference Tuning:
|
25 |
~100K carefully selected instructions, filtered by length and type for general tasks and with domain-balanced selection for reasoning tasks.
|
|
|
240 |
|
241 |
generated_text = [output.outputs[0].text for output in outputs]
|
242 |
print(generated_text)
|
243 |
+
```
|
244 |
+
|
245 |
+
|
246 |
+
|
247 |
+
## SGLang Usage
|
248 |
+
|
249 |
+
To run an inference server for **T-pro IT 2.0**, start by launching the SGLang server:
|
250 |
+
|
251 |
+
```bash
|
252 |
+
python -m sglang.launch_server \
|
253 |
+
--model-path t-tech/T-pro-it-2.0 \
|
254 |
+
--reasoning-parser qwen3
|
255 |
+
````
|
256 |
+
|
257 |
+
Once the server is up and listening on `localhost:30000`, you can send chat-based requests via the OpenAI Python client.
|
258 |
+
|
259 |
+
```python
|
260 |
+
import openai
|
261 |
+
|
262 |
+
client = openai.OpenAI(
|
263 |
+
base_url="http://127.0.0.1:30000/v1",
|
264 |
+
api_key="ANY" # the server ignores the API key
|
265 |
+
)
|
266 |
+
|
267 |
+
prompt = (
|
268 |
+
"Пожалуйста, вычисли определённый интеграл ∫_0^1 x² eˣ dx, "
|
269 |
+
"пошагово объясни решение и укажи окончательный результат."
|
270 |
+
)
|
271 |
+
|
272 |
+
completion = client.chat.completions.create(
|
273 |
+
model="ANY", # the server ignores the model name
|
274 |
+
messages=[
|
275 |
+
{"role": "system", "content": "Ты T-pro, виртуальный ассистент в Т-Технологии. Твоя задача - быть полезным диалоговым ассистентом."},
|
276 |
+
{"role": "user", "content": prompt}
|
277 |
+
],
|
278 |
+
# REQUIRED: sampling params from the "Recommended Generation Parameters" table
|
279 |
+
temperature=0.6,
|
280 |
+
presence_penalty=1.0,
|
281 |
+
)
|
282 |
+
|
283 |
+
# The generated reply is in `completion.choices[0].message.content`
|
284 |
+
print(completion.choices[0].message.content)
|
285 |
+
```
|
286 |
+
|
287 |
+
**Note:** It is **obligatory** to include both `temperature` and `presence_penalty` in every completion call.
|