Update README.md
Browse files
README.md
CHANGED
@@ -16,6 +16,9 @@ language:
|
|
16 |
## Model Overview
|
17 |
This model is built on top of **[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)** and finetuned using **LoRA** (Low-Rank Adaptation) and **RLHF**-style reward optimization, leveraging **vLLM** for fast inference. It is designed to respond with a specific structure (i.e., `<reasoning> ... </reasoning>` and `<final_argument> ... </final_argument>` sections) and maximize the number of well-formed argument-objection pairs.
|
18 |
|
|
|
|
|
|
|
19 |
## Key Features
|
20 |
- **Base Model**: Qwen/Qwen2.5-3B-Instruct
|
21 |
- **Quantization & Optimization**:
|
|
|
16 |
## Model Overview
|
17 |
This model is built on top of **[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)** and finetuned using **LoRA** (Low-Rank Adaptation) and **RLHF**-style reward optimization, leveraging **vLLM** for fast inference. It is designed to respond with a specific structure (i.e., `<reasoning> ... </reasoning>` and `<final_argument> ... </final_argument>` sections) and maximize the number of well-formed argument-objection pairs.
|
18 |
|
19 |
+
# Training script
|
20 |
+
Script here: https://colab.research.google.com/drive/15DVOLcs3dopw0xPQgxaG3LVwj266XWVS?usp=sharing
|
21 |
+
|
22 |
## Key Features
|
23 |
- **Base Model**: Qwen/Qwen2.5-3B-Instruct
|
24 |
- **Quantization & Optimization**:
|