metascroy commited on
Commit
b66d138
·
verified ·
1 Parent(s): 2821130

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -20,7 +20,8 @@ pipeline_tag: text-generation
20
  [Phi4-mini](https://huggingface.co/microsoft/Phi-4-mini-instruct) is quantized by the PyTorch team using [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) with 8-bit embeddings and 8-bit dynamic activations with 4-bit weight linears (8da4w).
21
  The model is suitable for mobile deployment with [ExecuTorch](https://github.com/pytorch/executorch).
22
 
23
- See [Exporting to ExecuTorch](#exporting-to-executorch) for exporting the quantized model to an ExecuTorch pte file. We also provide the [quantized pte](https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w/blob/main/phi4-mini-8da4w.pte) for direct use.
 
24
 
25
  # Running in a mobile app
26
  The [pte file](https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w/blob/main/phi4-mini-8da4w.pte) can be run with ExecuTorch on a mobile phone. See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
@@ -199,7 +200,8 @@ The following script does this for you. We have uploaded the converted checkpoi
199
  python -m executorch.examples.models.phi_4_mini.convert_weights pytorch_model.bin phi4-mini-8da4w-converted.bin
200
  ```
201
 
202
- Once the checkpoint is converted, we can export to ExecuTorch's PTE format with the XNNPACK delegate.
 
203
 
204
  ```Shell
205
  PARAMS="executorch/examples/models/phi_4_mini/config.json"
@@ -211,6 +213,8 @@ python -m executorch.examples.models.llama.export_llama \
211
  --use_sdpa_with_kv_cache \
212
  -X \
213
  --metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}' \
 
 
214
  --output_name="phi4-mini-8da4w.pte"
215
  ```
216
 
 
20
  [Phi4-mini](https://huggingface.co/microsoft/Phi-4-mini-instruct) is quantized by the PyTorch team using [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) with 8-bit embeddings and 8-bit dynamic activations with 4-bit weight linears (8da4w).
21
  The model is suitable for mobile deployment with [ExecuTorch](https://github.com/pytorch/executorch).
22
 
23
+ See [Exporting to ExecuTorch](#exporting-to-executorch) for exporting the quantized model to an ExecuTorch pte file. We also provide the [quantized pte](https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w/blob/main/phi4-mini-8da4w.pte) for direct use.
24
+ (The provided pte file is exported with the default max_seq_length/max_context_length of 128; if you wish to change this, re-export the model following the instructions in [Exporting to ExecuTorch](#exporting-to-executorch).)
25
 
26
  # Running in a mobile app
27
  The [pte file](https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w/blob/main/phi4-mini-8da4w.pte) can be run with ExecuTorch on a mobile phone. See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
 
200
  python -m executorch.examples.models.phi_4_mini.convert_weights pytorch_model.bin phi4-mini-8da4w-converted.bin
201
  ```
202
 
203
+ Once the checkpoint is converted, we can export to ExecuTorch's pte format with the XNNPACK delegate.
204
+ The below command exports with a max_seq_length/max_context_length of 128, which is the default value.
205
 
206
  ```Shell
207
  PARAMS="executorch/examples/models/phi_4_mini/config.json"
 
213
  --use_sdpa_with_kv_cache \
214
  -X \
215
  --metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}' \
216
+ --max_seq_length 128 \
217
+ --max_context_length 128 \
218
  --output_name="phi4-mini-8da4w.pte"
219
  ```
220