ergotts commited on
Commit
797c511
·
verified ·
1 Parent(s): 9b37432

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -17,7 +17,7 @@ language:
17
  This model is built on top of **[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)** and finetuned using **LoRA** (Low-Rank Adaptation) and **RLHF**-style reward optimization, leveraging **vLLM** for fast inference. It is designed to respond with a specific structure (i.e., `<reasoning> ... </reasoning>` and `<final_argument> ... </final_argument>` sections) and maximize the number of well-formed argument-objection pairs.
18
 
19
  # Training script
20
- Script here: https://colab.research.google.com/drive/15DVOLcs3dopw0xPQgxaG3LVwj266XWVS?usp=sharing
21
 
22
  ## Key Features
23
  - **Base Model**: Qwen/Qwen2.5-3B-Instruct
 
17
  This model is built on top of **[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)** and finetuned using **LoRA** (Low-Rank Adaptation) and **RLHF**-style reward optimization, leveraging **vLLM** for fast inference. It is designed to respond with a specific structure (i.e., `<reasoning> ... </reasoning>` and `<final_argument> ... </final_argument>` sections) and maximize the number of well-formed argument-objection pairs.
18
 
19
  # Training script
20
+ Script here (example of how to do inference at the bottom): https://colab.research.google.com/drive/15DVOLcs3dopw0xPQgxaG3LVwj266XWVS?usp=sharing
21
 
22
  ## Key Features
23
  - **Base Model**: Qwen/Qwen2.5-3B-Instruct