XeTute commited on
Commit
3c54d1e
·
1 Parent(s): e6b3e55

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -11,7 +11,7 @@ base_model:
11
 
12
  We are currently in the process of training our model, with an official release scheduled for **February 23, 2025**.
13
 
14
- Introducing **SaplingDream**, a compact GPT model with 0.5 billion parameters, based on the [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) architecture. This model has been fine-tuned on reasoning datasets with meticulous attention to detail, ensuring the highest quality—hence the name "SaplingDream."
15
 
16
  To enhance generalization, we are fine-tuning the base model using Stochastic Gradient Descent (SGD) alongside a "Polynomial" learning rate scheduler, starting with a learning rate of 1e-4. Our goal is to ensure that the model not only learns the tokens but also develops the ability to reason through problems effectively.
17
 
 
11
 
12
  We are currently in the process of training our model, with an official release scheduled for **February 23, 2025**.
13
 
14
+ Introducing **SaplingDream**, a compact GPT model with 0.5 billion parameters, based on the [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) architecture. This model has been fine-tuned on reasoning datasets with meticulous attention to detail, ensuring the highest quality—hence the name "SaplingDream." See this as advanced "instruction" tuning for the base model to support reasoning to make up for its size efficiently.
15
 
16
  To enhance generalization, we are fine-tuning the base model using Stochastic Gradient Descent (SGD) alongside a "Polynomial" learning rate scheduler, starting with a learning rate of 1e-4. Our goal is to ensure that the model not only learns the tokens but also develops the ability to reason through problems effectively.
17