marcusinthesky (Marcus Gawronsky)

upvoted a paper about 18 hours ago

What Matters in Transformers? Not All Attention is Needed

Paper • 2406.15786 • Published Jun 22 • 23

upvoted an article 3 days ago

Article

Model2Vec: Distill a Small Fast Model from any Sentence Transformer

By

•

4 days ago

• 37

upvoted a paper 4 days ago

Differential Transformer

Paper • 2410.05258 • Published 11 days ago • 152

upvoted an article 10 days ago

Article

Introducing the Open FinLLM Leaderboard

14 days ago

• 54

upvoted 4 papers 11 days ago

LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Paper • 2410.01744 • Published 16 days ago • 23

PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation

Paper • 2410.01680 • Published 16 days ago • 31

CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling

Paper • 2409.19291 • Published 20 days ago • 18

Contrastive Localized Language-Image Pre-Training

Paper • 2410.02746 • Published 15 days ago • 29

upvoted 2 papers 23 days ago

Phantom of Latent for Large Language and Vision Models

Paper • 2409.14713 • Published 25 days ago • 27

Making Text Embedders Few-Shot Learners

Paper • 2409.15700 • Published 24 days ago • 29

upvoted a paper about 1 month ago

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3 • 77

upvoted 3 papers about 2 months ago

upvoted 2 papers 2 months ago

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

Paper • 2408.04840 • Published Aug 9 • 31

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3 • 75

upvoted 3 papers 3 months ago

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30 • 23

KAN or MLP: A Fairer Comparison

Paper • 2407.16674 • Published Jul 23 • 40

H2O-Danube3 Technical Report

Paper • 2407.09276 • Published Jul 12 • 18

upvoted a paper 5 months ago

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 125

upvoted 3 papers 6 months ago

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30 • 73

KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30 • 108

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 118

upvoted an article 6 months ago

Article

Mixture of Depth is Vibe

By

•

Apr 22

• 44

upvoted 12 papers 6 months ago

Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts

Paper • 2310.05898 • Published Oct 9, 2023 • 2

Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 57

Transformers Can Represent n-gram Language Models

Paper • 2404.14994 • Published Apr 23 • 18

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 251

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 61

Gecko: Versatile Text Embeddings Distilled from Large Language Models

Paper • 2403.20327 • Published Mar 29 • 47

Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12 • 27

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11 • 41

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 84

Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

Paper • 2404.04256 • Published Apr 5 • 5

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3 • 64

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

Paper • 2404.04125 • Published Apr 4 • 27

upvoted 22 papers 7 months ago

LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Paper • 2404.01331 • Published Mar 29 • 24

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 104

Octopus v2: On-device language model for super agent

Paper • 2404.01744 • Published Apr 2 • 56

MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection

Paper • 2403.19888 • Published Mar 29 • 9

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Paper • 2403.19651 • Published Mar 28 • 23

NEFTune: Noisy Embeddings Improve Instruction Finetuning

Paper • 2310.05914 • Published Oct 9, 2023 • 14

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Paper • 2205.05638 • Published May 11, 2022 • 3

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 77

SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series

Paper • 2403.15360 • Published Mar 22 • 11

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

Paper • 2403.13447 • Published Mar 20 • 17

When Do We Not Need Larger Vision Models?

Paper • 2403.13043 • Published Mar 19 • 25

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Paper • 2403.14520 • Published Mar 21 • 32

ZigMa: Zigzag Mamba Diffusion Model

Paper • 2403.13802 • Published Mar 20 • 17

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Paper • 2402.13616 • Published Feb 21 • 45

YOLO-World: Real-Time Open-Vocabulary Object Detection

Paper • 2401.17270 • Published Jan 30 • 32

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14 • 72

Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

Paper • 2403.07487 • Published Mar 12 • 13

Gemma: Open Models Based on Gemini Research and Technology

Paper • 2403.08295 • Published Mar 13 • 47

Veagle: Advancements in Multimodal Representation Learning

Paper • 2403.08773 • Published Jan 18 • 7

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14 • 13

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 124

EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba

Paper • 2403.09977 • Published Mar 15 • 9

upvoted 2 papers 8 months ago

CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models

Paper • 2402.15021 • Published Feb 22 • 12

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22 • 19

Marcus Gawronsky

AI & ML interests

Organizations

marcusinthesky's activity

Model2Vec: Distill a Small Fast Model from any Sentence Transformer

Introducing the Open FinLLM Leaderboard

Mixture of Depth is Vibe