Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
13.8
TFLOPS
2
19
Rajdeep Ghosh
rumbleFTW
Follow
rajdeepV's profile picture
21world's profile picture
2 followers
·
11 following
rumbleFTW
rumbleFTW
AI & ML interests
Transformers, GANs, Audio synthesis, LLMs, Diffusion.
Recent Activity
liked
a model
12 days ago
hexgrad/Kokoro-82M
reacted
to
merve
's
post
with ❤️
5 months ago
If you have documents that do not only have text and you're doing retrieval or RAG (using OCR and LLMs), give it up and give ColPali and vision language models a try 🤗 Why? Documents consist of multiple modalities: layout, table, text, chart, images. Document processing pipelines often consist of multiple models and they're immensely brittle and slow. 🥲 How? ColPali is a ColBERT-like document retrieval model built on PaliGemma, it operates over image patches directly, and indexing takes far less time with more accuracy. You can use it for retrieval, and if you want to do retrieval augmented generation, find the closest document, and do not process it, give it directly to a VLM like Qwen2-VL (as image input) and give your text query. 🤝 This is much faster + you do not lose out on any information + much easier to maintain too! 🥳 Multimodal RAG https://huggingface.co/collections/merve/multimodal-rag-66d97602e781122aae0a5139 💬 Document AI (made it way before, for folks who want structured input/output and can fine-tune a model) https://huggingface.co/collections/merve/awesome-document-ai-65ef1cdc2e97ef9cc85c898e 📖
reacted
to
merve
's
post
with 👍
5 months ago
If you have documents that do not only have text and you're doing retrieval or RAG (using OCR and LLMs), give it up and give ColPali and vision language models a try 🤗 Why? Documents consist of multiple modalities: layout, table, text, chart, images. Document processing pipelines often consist of multiple models and they're immensely brittle and slow. 🥲 How? ColPali is a ColBERT-like document retrieval model built on PaliGemma, it operates over image patches directly, and indexing takes far less time with more accuracy. You can use it for retrieval, and if you want to do retrieval augmented generation, find the closest document, and do not process it, give it directly to a VLM like Qwen2-VL (as image input) and give your text query. 🤝 This is much faster + you do not lose out on any information + much easier to maintain too! 🥳 Multimodal RAG https://huggingface.co/collections/merve/multimodal-rag-66d97602e781122aae0a5139 💬 Document AI (made it way before, for folks who want structured input/output and can fine-tune a model) https://huggingface.co/collections/merve/awesome-document-ai-65ef1cdc2e97ef9cc85c898e 📖
View all activity
Organizations
rumbleFTW
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
12 days ago
hexgrad/Kokoro-82M
Text-to-Speech
•
Updated
23 days ago
•
1.12M
•
3.4k
liked
a model
6 months ago
deepseek-ai/DeepSeek-V2.5
Text Generation
•
Updated
Dec 11, 2024
•
4.3k
•
698
liked
6 models
7 months ago
sarvamai/shuka_v1
Updated
Oct 16, 2024
•
601
•
45
sarvamai/sarvam-2b-v0.5
Text Generation
•
Updated
Nov 8, 2024
•
570
•
•
84
parler-tts/parler-tts-large-v1
Text-to-Speech
•
Updated
Nov 22, 2024
•
19.2k
•
239
meta-llama/Llama-3.1-8B-Instruct
Text Generation
•
Updated
Sep 25, 2024
•
5.97M
•
•
3.67k
apple/DCLM-7B
Updated
Jul 26, 2024
•
713
•
833
Groq/Llama-3-Groq-8B-Tool-Use
Text Generation
•
Updated
Aug 27, 2024
•
1.3k
•
273
liked
2 models
8 months ago
google/gemma-2-9b
Text Generation
•
Updated
Aug 7, 2024
•
104k
•
648
hubertsiuzdak/snac_24khz
Updated
Apr 3, 2024
•
12.1k
•
17
liked
a model
9 months ago
bitext/Mistral-7B-Customer-Support
Text Generation
•
Updated
Jul 25, 2024
•
355
•
9
liked
2 models
10 months ago
meta-llama/Meta-Llama-3-8B
Text Generation
•
Updated
Sep 27, 2024
•
468k
•
6.05k
facebook/wav2vec2-base-960h
Automatic Speech Recognition
•
Updated
Nov 14, 2022
•
3.83M
•
•
319
liked
a model
11 months ago
xai-org/grok-1
Text Generation
•
Updated
Mar 28, 2024
•
1.09k
•
2.26k
liked
a Space
12 months ago
Running
2.5k
2.5k
OutfitAnyone
🏢
Generate virtual try-on results for clothing
liked
2 models
about 1 year ago
sarvamai/OpenHathi-7B-Hi-v0.1-Base
Text Generation
•
Updated
Dec 22, 2023
•
2.87k
•
106
microsoft/phi-2
Text Generation
•
Updated
Apr 29, 2024
•
500k
•
•
3.28k
liked
a Space
about 1 year ago
Running
339
339
PDF Chatbot
🌍
Ask questions about PDF documents
liked
a model
about 1 year ago
mistralai/Mistral-7B-v0.1
Text Generation
•
Updated
Jul 24, 2024
•
355k
•
3.6k