YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

run phi3-mini on AMD NPU

  1. If no phi3_mini_awq_4bit_no_flash_attention.pt, use awq quantization to get the quantization model.
  2. Put modeling_phi3.py in this repo into the phi-3-mini folder.
  3. Modify the file path in the run_awq.py
  4. run python run_awq.py --task decode --target aie --w_bit 4

reference:https://github.com/amd/RyzenAI-SW/tree/main/example/transformers

As the quantization of phi-3, may refer to https://github.com/mit-han-lab/llm-awq/pull/183

PS: The performance is similar to that on CPU(7640hs).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.