RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers

This repository contains the RIFLEx model described in RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers.

Project page: https://riflex-video.github.io/

🎉 Supported Models

Here, we list the SOTA video diffusion transformers that RIFLEx has been applied to. We are continuously working to support more models. Feel free to suggest additional models you would like us to support!

Model	Extrapolation	Example Results
HunyuanVideo	5s -> 11s
CogVideoX-5B	6s -> 12s
Wan2.1	5s -> 10s

To be continued……

🔥🔥 News

2025.3.1 : The code for CogVideoX-5B and fine-tuned CogVideoX-RIFLEx are released.
2025.2.26 RIFLEx is supported in HunyuanVideoGP, where a 10.5 s video at 1280x720 can be generated on an RTX 4090.
2025.2.26 RIFLEx is supported in ComfyUI-HunyuanVideoWrapper by KiJai.
2025.2.25 : The paper, project page, code and fine-tuned HunyuanVideo-RIFLEx are released.

RIFLEx Code

RIFLEx only adds a single line of code on the original 1D RoPE.

def get_1d_rotary_pos_embed_riflex(
    dim: int,
    pos: Union[np.ndarray, int],
    theta: float = 10000.0,
    k: Optional[int] = None,
    L_test: Optional[int] = None,
):
    '''
        k: the index for the intrinsic frequency in RoPE
        L_test: the number of frames for inference
    '''
    
    assert dim % 2 == 0
    if isinstance(pos, int):
        pos = torch.arange(pos)
    if isinstance(pos, np.ndarray):
        pos = torch.from_numpy(pos)
    freqs = 1.0 / (theta ** (torch.arange(0, dim, 2, device=pos.device)[: (dim // 2)].float() / dim)) 

    # === RIFLEx modification start ===
    # Reduce intrinsic frequency to stay within a single period after extrapolation (Eq.(8)).
    # Empirical observations show that a few videos may exhibit repetition in the tail frames.
    # To be conservative, we multiply 0.9 to keep extrapolated length below 90% of a period. 
    freqs[k-1] = 0.9 * 2 * torch.pi / L_test
    # === RIFLEx modification end ===

    freqs = torch.outer(pos, freqs)  
    freqs_cos = freqs.cos().repeat_interleave(2, dim=1).float()  
    freqs_sin = freqs.sin().repeat_interleave(2, dim=1).float()  
    return freqs_cos, freqs_sin

In riflex_utils.py, we show how to identify the intrinsic frequency in a RoPE-based pre-trained diffusion transformer.

Single GPU Inference with Diffusers for Quick Start

Installation

conda create -n riflex python=3.10
pip install -r requirements.txt
pip install -U bitsandbytes

Prompts

The example prompts for all models are listed in assets/prompts. The prompts on the project page can be found in assets/prompts/free_hunyuan.txt and assets/prompts/finetune_hunyuan.txt.

Please note that for single GPU inference with HunyuanVideo, Diffusers use DiffusersBitsAndBytesConfig to save memory, which may affect performance. To produce the demo on the project page, please refer to the Multi-GPU Inference section.

Inference for HunyuanVideo

2× temporal extrapolation

For training-free:

python hunyuanvideo.py --k 4 --N_k 50 --num_frames 261 --prompt "A white and orange tabby cat is seen happily darting through a dense garden, as if chasing something. Its eyes are wide and happy as it jogs forward, scanning the branches, flowers, and leaves as it walks. The path is narrow as it makes its way between all the plants. the scene is captured from a ground-level angle, following the cat closely, giving a low and intimate perspective. The image is cinematic with warm tones and a grainy texture. The scattered daylight between the leaves and plants above creates a warm contrast, accentuating the cat’s orange fur. The shot is clear and sharp, with a shallow depth of field."

For fine-tuned HunyuanVideo-RIFLEx:

python hunyuanvideo.py --k 4 --N_k 66 --num_frames 261 --finetune --model_id "thu-ml/Hunyuan-RIFLEx-diffusers" --prompt "3D animation of a small, round, fluffy creature with big, expressive eyes explores a vibrant, enchanted forest. The creature, a whimsical blend of a rabbit and a squirrel, has soft blue fur and a bushy, striped tail. It hops along a sparkling stream, its eyes wide with wonder. The forest is alive with magical elements: flowers that glow and change colors, trees with leaves in shades of purple and silver, and small floating lights that resemble fireflies. The creature stops to interact playfully with a group of tiny, fairy-like beings dancing around a mushroom ring. The creature looks up in awe at a large, glowing tree that seems to be the heart of the forest."

Note that the current version of diffusers only supports single-GPU inference. If there are multiple GPUs in the environment, please specify one by exporting CUDA_VISIBLE_DEVICES.

Inference for CogVideoX

2× temporal extrapolation

For training-free:

python cogvideox.py --k 2 --N_k 20 --num_frames 97 --prompt "3D animation of a small, round, fluffy creature with big, expressive eyes explores a vibrant, enchanted forest. The creature, a whimsical blend of a rabbit and a squirrel, has soft blue fur and a bushy, striped tail. It hops along a sparkling stream, its eyes wide with wonder. The forest is alive with magical elements: flowers that glow and change colors, trees with leaves in shades of purple and silver, and small floating lights that resemble fireflies. The creature stops to interact playfully with a group of tiny, fairy-like beings dancing around a mushroom ring. The creature looks up in awe at a large, glowing tree that seems to be the heart of the forest."

For fine-tuned CogVideoX-RIFLEx:

python cogvideox.py --k 2 --N_k 25 --num_frames 97 --finetune --model_id "thu-ml/CogVideoX-RIFLEx-diffusers" --prompt "A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography."

Multi-GPU Inference ( Recommended )

To enhance inference speed and reproduce the demos in our project page, please use the multi-gpu inference. Details can be found in the multi-gpu branch.

References

If you find the code useful, please cite

@article{zhao2025riflex,
  title={RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers},
  author={Zhao, Min and He, Guande and Chen, Yixiao and Zhu, Hongzhou and Li, Chongxuan and Zhu, Jun},
  journal={arXiv preprint arXiv:2502.15894},
  year={2025}
}