HuggingFaceTB
/

SmolLM-135M

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

eliebak HF staff commited on Jul 16, 2024

Commit

1f21d13

·

verified ·

1 Parent(s): f33ac84

Update README.md

Files changed (1) hide show

README.md +9 -14

README.md CHANGED Viewed

@@ -1,6 +1,8 @@
 ---
 library_name: transformers
-tags: []
 ---
@@ -10,23 +12,16 @@ tags: []
 ##  Table of Contents
 1. [Model Summary](##model-summary)
-2. [Use](##use)
-3. [Limitations](##limitations)
-4. [Training](##training)
-5. [License](##license)
-6. [Citation](##citation)
 ## Model Summary
 SmolLM is a series of state-of-the-art small language models available in three sizes: 135M, 360M, and 1.7B parameters. These models are built on Cosmo-Corpus, a meticulously curated high-quality training dataset. Cosmo-Corpus includes Cosmopedia v2 (28B tokens of synthetic textbooks and stories generated by Mixtral), Python-Edu (4B tokens of educational Python samples from The Stack), and FineWeb-Edu (220B tokens of deduplicated educational web samples from FineWeb). SmolLM models have shown promising results when compared to other models in their size categories across various benchmarks testing common sense reasoning and world knowledge. For detailed information on training, benchmarks and performance, please refer to our full blog post ADD LINK WHEN PUBLISH.
-## Use
-### Intended use
-The model was trained on [HuggingFaceTB/cosmo-corpus-v2](link)
 ### Generation
 First, make sure to install `transformers` from source:
 ```bash
@@ -104,7 +99,7 @@ The model has been trained on source code from 600+ programming languages. The p
 ## Model
-- **Architecture:** Transformer decoder with grouped-query and sliding window attention and Fill-in-the-Middle objective
 - **Pretraining steps:** 600k
 - **Pretraining tokens:** 600B
 - **Precision:** bfloat16
@@ -119,7 +114,7 @@ The model has been trained on source code from 600+ programming languages. The p
 # License
-TO MODIFY
 # Citation
 TO MODIFY

 ---
 library_name: transformers
+license: apache-2.0
+language:
+- en
 ---
 ##  Table of Contents
 1. [Model Summary](##model-summary)
+2. [Limitations](##limitations)
+3. [Training](##training)
+4. [License](##license)
+5. [Citation](##citation)
 ## Model Summary
 SmolLM is a series of state-of-the-art small language models available in three sizes: 135M, 360M, and 1.7B parameters. These models are built on Cosmo-Corpus, a meticulously curated high-quality training dataset. Cosmo-Corpus includes Cosmopedia v2 (28B tokens of synthetic textbooks and stories generated by Mixtral), Python-Edu (4B tokens of educational Python samples from The Stack), and FineWeb-Edu (220B tokens of deduplicated educational web samples from FineWeb). SmolLM models have shown promising results when compared to other models in their size categories across various benchmarks testing common sense reasoning and world knowledge. For detailed information on training, benchmarks and performance, please refer to our full blog post ADD LINK WHEN PUBLISH.
 ### Generation
 First, make sure to install `transformers` from source:
 ```bash
 ## Model
+- **Architecture:** See the blog post
 - **Pretraining steps:** 600k
 - **Pretraining tokens:** 600B
 - **Precision:** bfloat16
 # License
+[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 # Citation
 TO MODIFY