jerryzh168 commited on
Commit
fb71c2a
·
verified ·
1 Parent(s): 55e2bb7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -31,7 +31,7 @@ On iPhone 15 Pro, the model runs at 17.3 tokens/sec and uses 3206 Mb of memory.
31
  # Quantization Recipe
32
 
33
  First need to install the required packages:
34
- ```
35
  pip install git+https://github.com/huggingface/transformers@main
36
  pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cu126
37
  ```
@@ -39,7 +39,7 @@ pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/c
39
  ## Untie Embedding Weights
40
  Before quantization, since we need quantize input embedding and unembedding (lm_head) layer which are tied, but we want to quantize them separately, we first need to untie the model:
41
 
42
- ```
43
  from transformers import (
44
  AutoModelForCausalLM,
45
  AutoProcessor,
@@ -73,7 +73,7 @@ tokenizer.push_to_hub(save_to)
73
 
74
  We used following code to get the quantized model:
75
 
76
- ```
77
  from transformers import (
78
  AutoModelForCausalLM,
79
  AutoProcessor,
@@ -156,12 +156,12 @@ We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-h
156
  Need to install lm-eval from source: https://github.com/EleutherAI/lm-evaluation-harness#install
157
 
158
  ## baseline
159
- ```
160
  lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks hellaswag --device cuda:0 --batch_size 64
161
  ```
162
 
163
  ## int8 dynamic activation and int4 weight quantization (8da4w)
164
- ```
165
  lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-8da4w --tasks hellaswag --device cuda:0 --batch_size 64
166
  ```
167
 
@@ -195,13 +195,13 @@ Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.
195
 
196
  We first convert the [quantized checkpoint](https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w/blob/main/pytorch_model.bin) to one ExecuTorch's LLM export script expects by renaming some of the checkpoint keys.
197
  The following script does this for you. We have uploaded the converted checkpoint [phi4-mini-8da4w-converted.bin](https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w/blob/main/phi4-mini-8da4w-converted.bin) for convenience.
198
- ```
199
  python -m executorch.examples.models.phi_4_mini.convert_weights pytorch_model.bin phi4-mini-8da4w-converted.bin
200
  ```
201
 
202
  Once the checkpoint is converted, we can export to ExecuTorch's PTE format with the XNNPACK delegate.
203
 
204
- ```
205
  PARAMS="executorch/examples/models/phi_4_mini/config.json"
206
  python -m executorch.examples.models.llama.export_llama \
207
  --model "phi_4_mini" \
 
31
  # Quantization Recipe
32
 
33
  First need to install the required packages:
34
+ ```Shell
35
  pip install git+https://github.com/huggingface/transformers@main
36
  pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cu126
37
  ```
 
39
  ## Untie Embedding Weights
40
  Before quantization, since we need quantize input embedding and unembedding (lm_head) layer which are tied, but we want to quantize them separately, we first need to untie the model:
41
 
42
+ ```Py
43
  from transformers import (
44
  AutoModelForCausalLM,
45
  AutoProcessor,
 
73
 
74
  We used following code to get the quantized model:
75
 
76
+ ```Py
77
  from transformers import (
78
  AutoModelForCausalLM,
79
  AutoProcessor,
 
156
  Need to install lm-eval from source: https://github.com/EleutherAI/lm-evaluation-harness#install
157
 
158
  ## baseline
159
+ ```Shell
160
  lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks hellaswag --device cuda:0 --batch_size 64
161
  ```
162
 
163
  ## int8 dynamic activation and int4 weight quantization (8da4w)
164
+ ```Shell
165
  lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-8da4w --tasks hellaswag --device cuda:0 --batch_size 64
166
  ```
167
 
 
195
 
196
  We first convert the [quantized checkpoint](https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w/blob/main/pytorch_model.bin) to one ExecuTorch's LLM export script expects by renaming some of the checkpoint keys.
197
  The following script does this for you. We have uploaded the converted checkpoint [phi4-mini-8da4w-converted.bin](https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w/blob/main/phi4-mini-8da4w-converted.bin) for convenience.
198
+ ```Shell
199
  python -m executorch.examples.models.phi_4_mini.convert_weights pytorch_model.bin phi4-mini-8da4w-converted.bin
200
  ```
201
 
202
  Once the checkpoint is converted, we can export to ExecuTorch's PTE format with the XNNPACK delegate.
203
 
204
+ ```Shell
205
  PARAMS="executorch/examples/models/phi_4_mini/config.json"
206
  python -m executorch.examples.models.llama.export_llama \
207
  --model "phi_4_mini" \