113 168 1618

Xi

xi0v

AI & ML interests

Reinforcement learning, Diffusion Model Merging, LLM Merging, Model Editing and Vision/Multimodal Model Fine-tuning.

Recent Activity

upvoted a paper about 3 hours ago

Modifying Large Language Model Post-Training for Diverse Creative Writing

updated a model about 7 hours ago

xi0v/IlluX-v1Vpred

published a model about 7 hours ago

xi0v/IlluX-v1Vpred

View all activity

Organizations

xi0v's activity

upvoted a paper about 3 hours ago

Modifying Large Language Model Post-Training for Diverse Creative Writing

Paper • 2503.17126 • Published 3 days ago • 18

updated a model about 7 hours ago

xi0v/IlluX-v1Vpred

Text-to-Image • Updated about 7 hours ago

published a model about 7 hours ago

xi0v/IlluX-v1Vpred

Text-to-Image • Updated about 7 hours ago

published a model about 8 hours ago

xi0v/R-1.5-72B-exp

Text Generation • Updated about 8 hours ago

updated a model about 8 hours ago

xi0v/R-1.5-72B-exp

Text Generation • Updated about 8 hours ago

liked 2 models about 21 hours ago

manycore-research/SpatialLM-Llama-1B

Text Generation • Updated 4 days ago • 2.38k • 596

icefog72/Ice0.101-20.03-RP-GRPO-1

Text Generation • Updated 2 days ago • 26 • 2

reacted to Kseniase's post with 👀 1 day ago

Post

3080

8 types of RoPE

As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on.

Here are 8 types of RoPE that can be implemented in different cases:

1. Original RoPE -> RoFormer: Enhanced Transformer with Rotary Position Embedding (2104.09864)
Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info.

2. LongRoPE -> LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens (2402.13753)
Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search.

3. LongRoPE2 -> LongRoPE2: Near-Lossless LLM Context Window Scaling (2502.20082)
Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity.

4. Multimodal RoPE (MRoPE) -> Qwen2.5-VL Technical Report (2502.13923)
Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos.

5. Directional RoPE (DRoPE) -> DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling (2503.15029)
Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage.

6. VideoRoPE -> VideoRoPE: What Makes for Good Video Rotary Position Embedding? (2502.05173)
Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing.

7. VRoPE -> VRoPE: Rotary Position Embedding for Video Large Language Models (2502.11664)
An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus.

8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10
Introduces an exponential decay factor into the rotation matrix, improving stability on long sequences.