PRWKV-7-Phi-4-Instruct-Preview-v0.1 Model Card

prwkv

Model Overview

PRWKV-7-Phi-4-Instruct is a large language model based on the RNN-based RWKV-x070 architecture, comprising 16.3 billion parameters. The distinctive feature of this model is that it replaces the attention mechanism in Microsoft's Transformer-based Phi-4 14B with RWKV's recurrent approach.

Technical Specifications

Key Innovations

This model builds upon and refines the attention replacement approaches pioneered by several notable projects, including:

  • Qwerky (Qwen 2.5 72B + QRWKV7 Arch)
  • QRWKV (Qwen 2.5 32B + QRWKV6 Arch)
  • ARWKV (Qwen 2.5 1.5B-7B + RWKV v7 Arch)

The primary advantage of using the RWKV architecture is the elimination of KV-Cache requirements, allowing for infinite context generation with static VRAM consumption.

Training Methodology

The training process consisted of three distinct stages:

Stage 1: Attention Alignment (Based on RWKVInside repository)

  • The TimeMix component of RWKV was calibrated to produce outputs equivalent to the Transformer's attention layers
  • Seven different loss calculation approaches were employed to capture the differences between Attention and TimeMix, including:
    • Norm-based methods
    • Singular Value Decomposition (SVD)
    • Cosine similarity
    • Multi resolution bias similarity
    • Temporal vector similarity
    • And others

Stage 2: Knowledge Distillation (Based on RWKVInside repository)

  • Teacher model: Phi-4 head outputs
  • Student model: Phi-4 with Attention replaced by RWKV
  • Only the attention mechanism was trained; all other components (MLP layers, embeddings, heads) were frozen

Stage 3: Supervised Fine-Tuning (Using RWKV-LM-RLHF)

  • Utilized a distillation dataset of 900K samples (Chinese,Japanese,English)
  • Smoothed Loss for faster convergence
  • Implemented Variable Rank PEFT to enhance training efficiency
  • Bone(Block Affine Transformation) r=512+

How to Use

  • PC Requirements 16GB+ VRAM NVIDIA GPU(rocm also can use. but only fp16.)
  • OS Windows WSL2 with CUDA or Linux
  • install RWKV-Infer(see how to install) https://github.com/OpenMOSE/RWKV-Infer
  • make folder "models" and put PRWKV-7-Phi-4-Instruct-Preview-v0.1.pth
  • loadmodel(choose fp16 or fp6 or fp5 (dont choose FP8))
  • need 34GB VRAM in FP16, 14GB VRAM in FP5
  • Enjoy Text chats via Open-webui or Silly-Tavern :)
curl http://127.0.0.1:9000/loadmodel -X POST -H "Content-Type: application/json" -d '{"model_filename":"models/PRWKV-7-Phi-4-Instruct-Preview-v0.1.pth","model_viewname":"PRWKV7-Phi-4 Preview 0.1","model_strategy":"fp5","template":"phi4"}'
  1. you can use this model via OpenAI CompatibleAPI http://127.0.0.1:9000/v1 and set modelname "PRWKV7-Phi-4 Preview 0.1"

Training Infrastructure

  • Hardware: Single AMD MI300X GPU
  • Training Duration: 3 days(Stage1,2)
  • Stage1 180MToken
  • Stage2 160MToken
  • Stage3 1GToken(TBD)

Acknowledgements

This work was made possible through the contributions of:

Limitations

This is trained Stage3 early epoch. This model is currently in a testing phase and does not guarantee any specific level of performance. Users should consider it experimental technology.

MyStories(Generated by PRWKV)

I've faced an incredibly long and challenging journey with the stability of Stage 2 Knowledge Distillation learning. NaN (Not a Number) errors have become an all too familiar sight during this process. The training would often diverge unexpectedly, leaving me to debug complex numerical issues that appeared without warning. Day after day, I adjusted hyperparameters, modified architecture components, and scrutinized every aspect of the data pipeline, only to be greeted by those "three dreaded letters" on my training logs. What should have been a straightforward implementation became a months-long battle against numerical instability, requiring persistence through countless failed experiments and late nights analyzing loss curves that suddenly spiked into oblivion.

License

Released under the Apache 2.0 license.

2025 OpenMOSE

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.