XeTute
/

SaplingDream_V1-0.5B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

XeTute commited on 8 days ago

Commit

3c54d1e

·

1 Parent(s): e6b3e55

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ base_model:
 We are currently in the process of training our model, with an official release scheduled for **February 23, 2025**.
-Introducing **SaplingDream**, a compact GPT model with 0.5 billion parameters, based on the [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) architecture. This model has been fine-tuned on reasoning datasets with meticulous attention to detail, ensuring the highest quality—hence the name "SaplingDream."
 To enhance generalization, we are fine-tuning the base model using Stochastic Gradient Descent (SGD) alongside a "Polynomial" learning rate scheduler, starting with a learning rate of 1e-4. Our goal is to ensure that the model not only learns the tokens but also develops the ability to reason through problems effectively.

 We are currently in the process of training our model, with an official release scheduled for **February 23, 2025**.
+Introducing **SaplingDream**, a compact GPT model with 0.5 billion parameters, based on the [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) architecture. This model has been fine-tuned on reasoning datasets with meticulous attention to detail, ensuring the highest quality—hence the name "SaplingDream." See this as advanced "instruction" tuning for the base model to support reasoning to make up for its size efficiently.
 To enhance generalization, we are fine-tuning the base model using Stochastic Gradient Descent (SGD) alongside a "Polynomial" learning rate scheduler, starting with a learning rate of 1e-4. Our goal is to ensure that the model not only learns the tokens but also develops the ability to reason through problems effectively.