Update README.md
Browse files
README.md
CHANGED
@@ -14,6 +14,25 @@ base_model:
|
|
14 |
- SmolLM2-135M-Instruct-o1 is a finetuned version of the SmolLM2-135M-Instruct model on the OpenO1-SFT dataset for enhanced performance.
|
15 |
- Despite the extremely high dropout rate during training (95%), it still generates O1-style chain-of-thought output.
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
## Training Details
|
18 |
|
19 |
- **Base Model**: [HuggingFaceTB/SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct)
|
|
|
14 |
- SmolLM2-135M-Instruct-o1 is a finetuned version of the SmolLM2-135M-Instruct model on the OpenO1-SFT dataset for enhanced performance.
|
15 |
- Despite the extremely high dropout rate during training (95%), it still generates O1-style chain-of-thought output.
|
16 |
|
17 |
+
<details>
|
18 |
+
<summary>Click here for example</summary>
|
19 |
+
|
20 |
+
**Prompt:** `What things should I be aware of when I use large language models?`
|
21 |
+
|
22 |
+
**Answer:**
|
23 |
+
```
|
24 |
+
Alright, I need to address the question about using large language models in programming. First, it's important to consider the capabilities and limitations of these models. Large language models are designed to process vast amounts of text data, and their performance can be quite impressive, especially in terms of speed and accuracy. However, there are several factors to be aware of.
|
25 |
+
|
26 |
+
One major consideration is the model's training data. If the data is limited or biased, the model might struggle to learn the nuances of natural language, leading to poor performance. Ensuring that the training data is diverse and diverse enough is crucial. This means including a mix of different languages, dialects, and corpora to provide a comprehensive training set.
|
27 |
+
|
28 |
+
Another aspect is the model's architecture. Large language models typically employ deep neural networks, which can be computationally intensive. Therefore, it's essential to choose a model that's optimized for deep learning, which is generally more efficient than simpler models like simple linear regression.
|
29 |
+
|
30 |
+
In terms of training, large language models require a significant amount of data. This can be particularly challenging if the data is limited or not in a standard format. Providing data in a format that's easy to work with, such as CSV or JSON, can help alleviate the burden.
|
31 |
+
|
32 |
+
Data quality is another critical factor. Large language models require high-quality text data, but [more output truncated...]
|
33 |
+
```
|
34 |
+
</details>
|
35 |
+
|
36 |
## Training Details
|
37 |
|
38 |
- **Base Model**: [HuggingFaceTB/SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct)
|