ergotts
/

r1-objection

text-generation-inference

Model card Files Files and versions Community

ergotts commited on Feb 14

Commit

9b37432

·

verified ·

1 Parent(s): e50fc4c

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -16,6 +16,9 @@ language:
 ## Model Overview
 This model is built on top of **[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)** and finetuned using **LoRA** (Low-Rank Adaptation) and **RLHF**-style reward optimization, leveraging **vLLM** for fast inference. It is designed to respond with a specific structure (i.e., `<reasoning> ... </reasoning>` and `<final_argument> ... </final_argument>` sections) and maximize the number of well-formed argument-objection pairs.
 ## Key Features
 - **Base Model**: Qwen/Qwen2.5-3B-Instruct
 - **Quantization & Optimization**:

 ## Model Overview
 This model is built on top of **[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)** and finetuned using **LoRA** (Low-Rank Adaptation) and **RLHF**-style reward optimization, leveraging **vLLM** for fast inference. It is designed to respond with a specific structure (i.e., `<reasoning> ... </reasoning>` and `<final_argument> ... </final_argument>` sections) and maximize the number of well-formed argument-objection pairs.
+# Training script
+Script here: https://colab.research.google.com/drive/15DVOLcs3dopw0xPQgxaG3LVwj266XWVS?usp=sharing
 ## Key Features
 - **Base Model**: Qwen/Qwen2.5-3B-Instruct
 - **Quantization & Optimization**: