Updated model card
Browse files
README.md
CHANGED
@@ -11,25 +11,23 @@ tags:
|
|
11 |
- smollm
|
12 |
---
|
13 |
|
14 |
-
|
15 |
|
16 |
-
|
17 |
-
Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), you can directly download the `*.pte` and tokenizer file and run the model in a mobile app (see [Running in a mobile app](#running-in-a-mobile-app)).
|
18 |
|
|
|
|
|
19 |
|
20 |
-
|
21 |
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
```Py
|
29 |
-
python install_dev.py
|
30 |
-
```
|
31 |
|
32 |
-
|
33 |
```Shell
|
34 |
optimum-cli export executorch \
|
35 |
--model HuggingFaceTB/SmolLM3-3B \
|
|
|
11 |
- smollm
|
12 |
---
|
13 |
|
14 |
+
[HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) is quantized using [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) with 8-bit embeddings and 8-bit dynamic activations with 4-bit weight linears (`8da4w`). It is then lowered to [ExecuTorch](https://github.com/pytorch/executorch) with several optimizations—custom SPDA, custom KV cache, and parallel prefill—to achieve high performance on the CPU backend, making it well-suited for mobile deployment.
|
15 |
|
16 |
+
We provide the [.pte file](https://huggingface.co/pytorch/SmolLM3-3B-8da4w/blob/main/smollm3-3b-8da4w.pte) for direct use in ExecuTorch. *(The provided pte file is exported with the default max_seq_length/max_context_length of 2k.)*
|
|
|
17 |
|
18 |
+
# Running in a mobile app
|
19 |
+
The [.pte file](https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w/blob/main/phi4-mini-8da4w.pte) can be run with ExecuTorch on a mobile phone. See the instructions for doing this in [iOS](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) and [Android](https://docs.pytorch.org/executorch/main/llm/llama-demo-android.html).
|
20 |
|
21 |
+
On Google's Pixel 8 Pro, the model runs at 12.7 tokens/s.
|
22 |
|
23 |
+
# Running with ExecuTorch’s sample runner
|
24 |
+
You can also run this model using ExecuTorch’s sample runner following [Step 3&4 in this instruction](https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#step-3-run-on-your-computer-to-validate)
|
25 |
+
|
26 |
+
|
27 |
+
# Export Recipe
|
28 |
+
You can re-create the `.pte` file from eager source using this export recipe.
|
|
|
|
|
|
|
29 |
|
30 |
+
First install `optimum-executorch` by following this [instruction](https://github.com/huggingface/optimum-executorch?tab=readme-ov-file#-quick-installation), then you can use `optimum-cli` to export the model to ExecuTorch:
|
31 |
```Shell
|
32 |
optimum-cli export executorch \
|
33 |
--model HuggingFaceTB/SmolLM3-3B \
|