5 29 79

Umitcan Sahin PRO

ucsahin

AI & ML interests

Visual Language Models, Large Language Models, Vision Transformers

Recent Activity

liked a model 4 days ago

ByteDance/Sa2VA-4B

liked a model 4 days ago

microsoft/phi-4

liked a model 4 days ago

ByteDance/Sa2VA-8B

View all activity

Organizations

None yet

ucsahin's activity

liked 4 models 4 days ago

reacted to m-ric's post with 🤗🚀🔥 6 days ago

Post

4848

Since I published it on GitHub a few days ago,
Hugging Face's new agentic library 𝘀𝗺𝗼𝗹𝗮𝗴𝗲𝗻𝘁𝘀 has gathered nearly 4k stars 🤯

➡️ But we are just getting started on agents: so we are hiring an ML Engineer to join me and double down on this effort!

The plan is to build GUI agents: agents that can act on your computer with mouse & keyboard, like Claude Computer Use.

We will make it work better, and fully open. ✨

Sounds like something you'd like to do? Apply here 👉 https://apply.workable.com/huggingface/j/AF1D4E3FEB/

3 replies

liked 2 datasets 7 days ago

muhammetfatihaktug/bilim_teknik_mini_colpali

Viewer • Updated 8 days ago • 4.5k • 43 • 3

selimc/tr-textbook-ColPali

Viewer • Updated 8 days ago • 3k • 32 • 2

liked a model 7 days ago

selimc/turkish-colpali

Updated 7 days ago • 90 • 3

upvoted 2 collections 15 days ago

DeepSeek-V3

Collection

3 items • Updated 8 days ago • 112

DeepSeek-VL2

Collection

4 items • Updated 27 days ago • 36

reacted to singhsidhukuldeep's post with 🔥 21 days ago

Post

2183

Exciting News in AI: JinaAI Releases JINA-CLIP-v2!

The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal:

🚀 Technical Highlights:
- Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder
- Supports 89 languages with 8,192 token context length
- Processes images up to 512×512 pixels with 14×14 patch size
- Implements FlashAttention2 for text and xFormers for vision processing
- Uses Matryoshka Representation Learning for efficient vector storage

⚡️ Under The Hood:
- Multi-stage training process with progressive resolution scaling (224→384→512)
- Contrastive learning using InfoNCE loss in both directions
- Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs
- Incorporates specialized datasets for document understanding, scientific graphs, and infographics
- Uses hard negative mining with 7 negatives per positive sample

📊 Performance:
- Outperforms previous models on visual document retrieval (52.65% nDCG@5)
- Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark
- Strong multilingual performance across 30 languages
- Maintains performance even with 75% dimension reduction (256D vs 1024D)

🎯 Key Innovation:
The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems!

Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!

upvoted a collection about 1 month ago

DataGemma Release

Collection

A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated Dec 13, 2024 • 82

New activity in ucsahin/TR-VLM-DPO-Dataset about 1 month ago

[bot] Conversion to Parquet

#1 opened 4 months ago by

parquet-converter

upvoted a collection about 1 month ago

Turkish Instruction Datasets

Collection

Collection of instruction datasets for Turkish. • 38 items • Updated 12 days ago • 2

reacted to merve's post with 🔥👀👍 about 2 months ago

Post

2191

The authors of ColPali trained a retrieval model based on SmolVLM 🤠 vidore/colsmolvlm-alpha
TLDR;

- ColSmolVLM performs better than ColPali and DSE-Qwen2 on all English tasks

- ColSmolVLM is more memory efficient than ColQwen2 💗

liked a model about 2 months ago

AIDC-AI/Marco-o1

Text Generation • Updated Nov 23, 2024 • 4.98k • 692