batalovme commited on
Commit
3c90d8b
·
verified ·
1 Parent(s): 2ff3e54

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -2
README.md CHANGED
@@ -19,7 +19,7 @@ Instruction Pre-Training:
19
  40B tokens of instruction data, with one-third focused on reasoning tasks.
20
 
21
  Supervised Fine-Tuning (SFT):
22
- ~500K high-quality and diverse instructions with balanced complexity. Reasoning tasks make up 10% of the dataset.
23
 
24
  Preference Tuning:
25
  ~100K carefully selected instructions, filtered by length and type for general tasks and with domain-balanced selection for reasoning tasks.
@@ -240,4 +240,48 @@ outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampli
240
 
241
  generated_text = [output.outputs[0].text for output in outputs]
242
  print(generated_text)
243
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  40B tokens of instruction data, with one-third focused on reasoning tasks.
20
 
21
  Supervised Fine-Tuning (SFT):
22
+ ~500K high-quality and diverse instructions with balanced complexity. Reasoning tasks make up about 20% of the dataset.
23
 
24
  Preference Tuning:
25
  ~100K carefully selected instructions, filtered by length and type for general tasks and with domain-balanced selection for reasoning tasks.
 
240
 
241
  generated_text = [output.outputs[0].text for output in outputs]
242
  print(generated_text)
243
+ ```
244
+
245
+
246
+
247
+ ## SGLang Usage
248
+
249
+ To run an inference server for **T-pro IT 2.0**, start by launching the SGLang server:
250
+
251
+ ```bash
252
+ python -m sglang.launch_server \
253
+ --model-path t-tech/T-pro-it-2.0 \
254
+ --reasoning-parser qwen3
255
+ ````
256
+
257
+ Once the server is up and listening on `localhost:30000`, you can send chat-based requests via the OpenAI Python client.
258
+
259
+ ```python
260
+ import openai
261
+
262
+ client = openai.OpenAI(
263
+ base_url="http://127.0.0.1:30000/v1",
264
+ api_key="ANY" # the server ignores the API key
265
+ )
266
+
267
+ prompt = (
268
+ "Пожалуйста, вычисли определённый интеграл ∫_0^1 x² eˣ dx, "
269
+ "пошагово объясни решение и укажи окончательный результат."
270
+ )
271
+
272
+ completion = client.chat.completions.create(
273
+ model="ANY", # the server ignores the model name
274
+ messages=[
275
+ {"role": "system", "content": "Ты T-pro, виртуальный ассистент в Т-Технологии. Твоя задача - быть полезным диалоговым ассистентом."},
276
+ {"role": "user", "content": prompt}
277
+ ],
278
+ # REQUIRED: sampling params from the "Recommended Generation Parameters" table
279
+ temperature=0.6,
280
+ presence_penalty=1.0,
281
+ )
282
+
283
+ # The generated reply is in `completion.choices[0].message.content`
284
+ print(completion.choices[0].message.content)
285
+ ```
286
+
287
+ **Note:** It is **obligatory** to include both `temperature` and `presence_penalty` in every completion call.