ergotts commited on
Commit
9b37432
·
verified ·
1 Parent(s): e50fc4c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -16,6 +16,9 @@ language:
16
  ## Model Overview
17
  This model is built on top of **[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)** and finetuned using **LoRA** (Low-Rank Adaptation) and **RLHF**-style reward optimization, leveraging **vLLM** for fast inference. It is designed to respond with a specific structure (i.e., `<reasoning> ... </reasoning>` and `<final_argument> ... </final_argument>` sections) and maximize the number of well-formed argument-objection pairs.
18
 
 
 
 
19
  ## Key Features
20
  - **Base Model**: Qwen/Qwen2.5-3B-Instruct
21
  - **Quantization & Optimization**:
 
16
  ## Model Overview
17
  This model is built on top of **[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)** and finetuned using **LoRA** (Low-Rank Adaptation) and **RLHF**-style reward optimization, leveraging **vLLM** for fast inference. It is designed to respond with a specific structure (i.e., `<reasoning> ... </reasoning>` and `<final_argument> ... </final_argument>` sections) and maximize the number of well-formed argument-objection pairs.
18
 
19
+ # Training script
20
+ Script here: https://colab.research.google.com/drive/15DVOLcs3dopw0xPQgxaG3LVwj266XWVS?usp=sharing
21
+
22
  ## Key Features
23
  - **Base Model**: Qwen/Qwen2.5-3B-Instruct
24
  - **Quantization & Optimization**: