Dataset Card for SmolLM2-135M-Instruct-o1

Model Description

SmolLM2-135M-Instruct-o1 is a finetuned version of the SmolLM2-135M-Instruct model on the OpenO1-SFT dataset for enhanced performance.
Despite the extremely high dropout rate during training (95%), it still generates O1-style chain-of-thought output.

Click here for example

Prompt: What things should I be aware of when I use large language models?

Answer:

Alright, I need to address the question about using large language models in programming. First, it's important to consider the capabilities and limitations of these models. Large language models are designed to process vast amounts of text data, and their performance can be quite impressive, especially in terms of speed and accuracy. However, there are several factors to be aware of.

One major consideration is the model's training data. If the data is limited or biased, the model might struggle to learn the nuances of natural language, leading to poor performance. Ensuring that the training data is diverse and diverse enough is crucial. This means including a mix of different languages, dialects, and corpora to provide a comprehensive training set.

Another aspect is the model's architecture. Large language models typically employ deep neural networks, which can be computationally intensive. Therefore, it's essential to choose a model that's optimized for deep learning, which is generally more efficient than simpler models like simple linear regression.

In terms of training, large language models require a significant amount of data. This can be particularly challenging if the data is limited or not in a standard format. Providing data in a format that's easy to work with, such as CSV or JSON, can help alleviate the burden.

Data quality is another critical factor. Large language models require high-quality text data, but [more output truncated...]

Training Details

Base Model: HuggingFaceTB/SmolLM2-135M-Instruct
Dataset: OpenO1-SFT (complete dataset)
Training Duration: 1 epoch

More details

Model Specifications

Architecture: Transformer decoder (135M parameters)
LoRA Configuration:
- Rank: 32
- Alpha: 64
- Dropout: 0.95
Additional Techniques:
- rslora
- DoRA
- NEFTune (alpha: 5)
Kernel: Liger
Prompt Template: chatml

Special Characteristics

No RoPE scaling applied
No quantization used

Usage

This model can be used for various natural language processing tasks, leveraging its instruction-following capabilities inherited from the base model and enhanced through additional finetuning.

Limitations

Users should be aware of potential limitations similar to those of the base SmolLM2 model including:

limited knowledge and reasoning capabilities
English language only
possible biases

agentlans
/

SmolLM2-135M-Instruct-o1