Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

posted an update 1 day ago

Lightweight (nanoGPT) implementation of hybrid norm - an intuitive normalization method that combines the strength of both pre-norm (i.e QKV-norm in MHA) and post-norm in the feed-forward network. Code: https://github.com/Jaykef/ai-algorithms/blob/main/hybrid_normalization.ipynb

upvoted a paper 1 day ago

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

upvoted a paper 6 days ago

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

View all activity

Organizations

Jaward's activity

upvoted a paper 1 day ago

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

Paper • 2503.04598 • Published 4 days ago • 16

upvoted a paper 6 days ago

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published 7 days ago • 66

upvoted 2 papers about 1 month ago

Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published Feb 3 • 55

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published Feb 3 • 186

upvoted an article about 1 month ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 798

upvoted a paper about 2 months ago

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 106

upvoted a collection 2 months ago

Cosmos

Collection

The collection of Cosmos models • 31 items • Updated Jan 17 • 268

upvoted 4 papers 4 months ago

Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published Nov 21, 2024 • 43

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 114

AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

Paper • 2410.18603 • Published Oct 24, 2024 • 32

GPT-4o System Card

Paper • 2410.21276 • Published Oct 25, 2024 • 84

upvoted 4 papers 5 months ago

upvoted a collection 5 months ago

Emu3

Collection

Emu3: Next-Token Prediction is All You Need • 7 items • Updated 25 days ago • 69

upvoted a paper 5 months ago

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness

Paper • 2409.18125 • Published Sep 26, 2024 • 34

upvoted 3 papers 6 months ago

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 141

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Paper • 2409.02634 • Published Sep 4, 2024 • 94

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 123