Kuldeep Singh Sidhu's picture
6 3

Kuldeep Singh Sidhu

singhsidhukuldeep

AI & ML interests

šŸ˜ƒ TOP 3 on HuggingFace for posts šŸ¤— Seeking contributors for a completely open-source šŸš€ Data Science platform! singhsidhukuldeep.github.io

Recent Activity

posted an update about 13 hours ago
Exciting News in AI: JinaAI Releases JINA-CLIP-v2! The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal: šŸš€ Technical Highlights: - Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder - Supports 89 languages with 8,192 token context length - Processes images up to 512Ɨ512 pixels with 14Ɨ14 patch size - Implements FlashAttention2 for text and xFormers for vision processing - Uses Matryoshka Representation Learning for efficient vector storage āš”ļø Under The Hood: - Multi-stage training process with progressive resolution scaling (224ā†’384ā†’512) - Contrastive learning using InfoNCE loss in both directions - Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs - Incorporates specialized datasets for document understanding, scientific graphs, and infographics - Uses hard negative mining with 7 negatives per positive sample šŸ“Š Performance: - Outperforms previous models on visual document retrieval (52.65% nDCG@5) - Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark - Strong multilingual performance across 30 languages - Maintains performance even with 75% dimension reduction (256D vs 1024D) šŸŽÆ Key Innovation: The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems! Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!
posted an update 1 day ago
Fascinating insights from @Pinterest 's latest research on improving feature interactions in recommendation systems! Pinterest's engineering team has tackled a critical challenge in their Homefeed ranking system that serves 500M+ monthly active users. Here's what makes their approach remarkable: >> Technical Deep Dive Architecture Overview ā€¢ The ranking model combines dense features, sparse features, and embedding features to represent users, Pins, and context ā€¢ Sparse features are processed using learnable embeddings with size based on feature cardinality ā€¢ User sequence embeddings are generated using a transformer architecture processing past engagements Feature Processing Pipeline ā€¢ Dense features undergo normalization for numerical stability ā€¢ Sparse and embedding features receive L2 normalization ā€¢ All features are concatenated into a single feature embedding Key Innovations ā€¢ Implemented parallel MaskNet layers with 3 blocks ā€¢ Used projection ratio of 2.0 and output dimension of 512 ā€¢ Stacked 4 DCNv2 layers on top for higher-order interactions Performance Improvements ā€¢ Achieved +1.42% increase in Homefeed Save Volume ā€¢ Boosted Overall Time Spent by +0.39% ā€¢ Maintained memory consumption increase to just 5% >> Industry Constraints Addressed Memory Management ā€¢ Optimized for 60% GPU memory utilization ā€¢ Prevented OOM errors while maintaining batch size efficiency Latency Optimization ā€¢ Removed input-output concatenation before MLP ā€¢ Reduced hidden layer sizes in MLP ā€¢ Achieved zero latency increase while improving performance System Stability ā€¢ Ensured reproducible results across retraining ā€¢ Maintained model stability across different data distributions ā€¢ Successfully deployed in production environment This work brilliantly demonstrates how to balance academic innovations with real-world industrial constraints. Kudos to the Pinterest team!
updated a Space 3 days ago
singhsidhukuldeep/posts_leaderboard
View all activity

Organizations

MLX Community's profile picture Social Post Explorers's profile picture C4AI Community's profile picture

Posts 104

view post
Post
798
Exciting News in AI: JinaAI Releases JINA-CLIP-v2!

The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal:

šŸš€ Technical Highlights:
- Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder
- Supports 89 languages with 8,192 token context length
- Processes images up to 512Ɨ512 pixels with 14Ɨ14 patch size
- Implements FlashAttention2 for text and xFormers for vision processing
- Uses Matryoshka Representation Learning for efficient vector storage

āš”ļø Under The Hood:
- Multi-stage training process with progressive resolution scaling (224ā†’384ā†’512)
- Contrastive learning using InfoNCE loss in both directions
- Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs
- Incorporates specialized datasets for document understanding, scientific graphs, and infographics
- Uses hard negative mining with 7 negatives per positive sample

šŸ“Š Performance:
- Outperforms previous models on visual document retrieval (52.65% nDCG@5)
- Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark
- Strong multilingual performance across 30 languages
- Maintains performance even with 75% dimension reduction (256D vs 1024D)

šŸŽÆ Key Innovation:
The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems!

Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!
view post
Post
915
Fascinating insights from @Pinterest 's latest research on improving feature interactions in recommendation systems!

Pinterest's engineering team has tackled a critical challenge in their Homefeed ranking system that serves 500M+ monthly active users. Here's what makes their approach remarkable:

>> Technical Deep Dive

Architecture Overview
ā€¢ The ranking model combines dense features, sparse features, and embedding features to represent users, Pins, and context
ā€¢ Sparse features are processed using learnable embeddings with size based on feature cardinality
ā€¢ User sequence embeddings are generated using a transformer architecture processing past engagements

Feature Processing Pipeline
ā€¢ Dense features undergo normalization for numerical stability
ā€¢ Sparse and embedding features receive L2 normalization
ā€¢ All features are concatenated into a single feature embedding

Key Innovations
ā€¢ Implemented parallel MaskNet layers with 3 blocks
ā€¢ Used projection ratio of 2.0 and output dimension of 512
ā€¢ Stacked 4 DCNv2 layers on top for higher-order interactions

Performance Improvements
ā€¢ Achieved +1.42% increase in Homefeed Save Volume
ā€¢ Boosted Overall Time Spent by +0.39%
ā€¢ Maintained memory consumption increase to just 5%

>> Industry Constraints Addressed

Memory Management
ā€¢ Optimized for 60% GPU memory utilization
ā€¢ Prevented OOM errors while maintaining batch size efficiency

Latency Optimization
ā€¢ Removed input-output concatenation before MLP
ā€¢ Reduced hidden layer sizes in MLP
ā€¢ Achieved zero latency increase while improving performance

System Stability
ā€¢ Ensured reproducible results across retraining
ā€¢ Maintained model stability across different data distributions
ā€¢ Successfully deployed in production environment

This work brilliantly demonstrates how to balance academic innovations with real-world industrial constraints. Kudos to the Pinterest team!

models

None public yet

datasets

None public yet