Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,8 @@ pipeline_tag: text-generation
|
|
20 |
[Phi4-mini](https://huggingface.co/microsoft/Phi-4-mini-instruct) is quantized by the PyTorch team using [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) with 8-bit embeddings and 8-bit dynamic activations with 4-bit weight linears (8da4w).
|
21 |
The model is suitable for mobile deployment with [ExecuTorch](https://github.com/pytorch/executorch).
|
22 |
|
23 |
-
See [Exporting to ExecuTorch](#exporting-to-executorch) for exporting the quantized model to an ExecuTorch pte file. We also provide the [quantized pte](https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w/blob/main/phi4-mini-8da4w.pte) for direct use.
|
|
|
24 |
|
25 |
# Running in a mobile app
|
26 |
The [pte file](https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w/blob/main/phi4-mini-8da4w.pte) can be run with ExecuTorch on a mobile phone. See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
|
@@ -199,7 +200,8 @@ The following script does this for you. We have uploaded the converted checkpoi
|
|
199 |
python -m executorch.examples.models.phi_4_mini.convert_weights pytorch_model.bin phi4-mini-8da4w-converted.bin
|
200 |
```
|
201 |
|
202 |
-
Once the checkpoint is converted, we can export to ExecuTorch's
|
|
|
203 |
|
204 |
```Shell
|
205 |
PARAMS="executorch/examples/models/phi_4_mini/config.json"
|
@@ -211,6 +213,8 @@ python -m executorch.examples.models.llama.export_llama \
|
|
211 |
--use_sdpa_with_kv_cache \
|
212 |
-X \
|
213 |
--metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}' \
|
|
|
|
|
214 |
--output_name="phi4-mini-8da4w.pte"
|
215 |
```
|
216 |
|
|
|
20 |
[Phi4-mini](https://huggingface.co/microsoft/Phi-4-mini-instruct) is quantized by the PyTorch team using [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) with 8-bit embeddings and 8-bit dynamic activations with 4-bit weight linears (8da4w).
|
21 |
The model is suitable for mobile deployment with [ExecuTorch](https://github.com/pytorch/executorch).
|
22 |
|
23 |
+
See [Exporting to ExecuTorch](#exporting-to-executorch) for exporting the quantized model to an ExecuTorch pte file. We also provide the [quantized pte](https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w/blob/main/phi4-mini-8da4w.pte) for direct use.
|
24 |
+
(The provided pte file is exported with the default max_seq_length/max_context_length of 128; if you wish to change this, re-export the model following the instructions in [Exporting to ExecuTorch](#exporting-to-executorch).)
|
25 |
|
26 |
# Running in a mobile app
|
27 |
The [pte file](https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w/blob/main/phi4-mini-8da4w.pte) can be run with ExecuTorch on a mobile phone. See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
|
|
|
200 |
python -m executorch.examples.models.phi_4_mini.convert_weights pytorch_model.bin phi4-mini-8da4w-converted.bin
|
201 |
```
|
202 |
|
203 |
+
Once the checkpoint is converted, we can export to ExecuTorch's pte format with the XNNPACK delegate.
|
204 |
+
The below command exports with a max_seq_length/max_context_length of 128, which is the default value.
|
205 |
|
206 |
```Shell
|
207 |
PARAMS="executorch/examples/models/phi_4_mini/config.json"
|
|
|
213 |
--use_sdpa_with_kv_cache \
|
214 |
-X \
|
215 |
--metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}' \
|
216 |
+
--max_seq_length 128 \
|
217 |
+
--max_context_length 128 \
|
218 |
--output_name="phi4-mini-8da4w.pte"
|
219 |
```
|
220 |
|