Rajdeep Ghosh's picture

2 19

Rajdeep Ghosh

rumbleFTW

·

AI & ML interests

Transformers, GANs, Audio synthesis, LLMs, Diffusion.

Recent Activity

liked a model 12 days ago

hexgrad/Kokoro-82M

reacted to merve's post with ❤️ 5 months ago

If you have documents that do not only have text and you're doing retrieval or RAG (using OCR and LLMs), give it up and give ColPali and vision language models a try 🤗 Why? Documents consist of multiple modalities: layout, table, text, chart, images. Document processing pipelines often consist of multiple models and they're immensely brittle and slow. 🥲 How? ColPali is a ColBERT-like document retrieval model built on PaliGemma, it operates over image patches directly, and indexing takes far less time with more accuracy. You can use it for retrieval, and if you want to do retrieval augmented generation, find the closest document, and do not process it, give it directly to a VLM like Qwen2-VL (as image input) and give your text query. 🤝 This is much faster + you do not lose out on any information + much easier to maintain too! 🥳 Multimodal RAG https://huggingface.co/collections/merve/multimodal-rag-66d97602e781122aae0a5139 💬 Document AI (made it way before, for folks who want structured input/output and can fine-tune a model) https://huggingface.co/collections/merve/awesome-document-ai-65ef1cdc2e97ef9cc85c898e 📖

reacted to merve's post with 👍 5 months ago

If you have documents that do not only have text and you're doing retrieval or RAG (using OCR and LLMs), give it up and give ColPali and vision language models a try 🤗 Why? Documents consist of multiple modalities: layout, table, text, chart, images. Document processing pipelines often consist of multiple models and they're immensely brittle and slow. 🥲 How? ColPali is a ColBERT-like document retrieval model built on PaliGemma, it operates over image patches directly, and indexing takes far less time with more accuracy. You can use it for retrieval, and if you want to do retrieval augmented generation, find the closest document, and do not process it, give it directly to a VLM like Qwen2-VL (as image input) and give your text query. 🤝 This is much faster + you do not lose out on any information + much easier to maintain too! 🥳 Multimodal RAG https://huggingface.co/collections/merve/multimodal-rag-66d97602e781122aae0a5139 💬 Document AI (made it way before, for folks who want structured input/output and can fine-tune a model) https://huggingface.co/collections/merve/awesome-document-ai-65ef1cdc2e97ef9cc85c898e 📖

View all activity

Organizations

rumbleFTW's activity

liked a model 12 days ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 23 days ago • 1.12M • 3.4k

liked a model 6 months ago

deepseek-ai/DeepSeek-V2.5

Text Generation • Updated Dec 11, 2024 • 4.3k • 698

liked 6 models 7 months ago

sarvamai/shuka_v1

Updated Oct 16, 2024 • 601 • 45

sarvamai/sarvam-2b-v0.5

Text Generation • Updated Nov 8, 2024 • 570 • • 84

parler-tts/parler-tts-large-v1

Text-to-Speech • Updated Nov 22, 2024 • 19.2k • 239

meta-llama/Llama-3.1-8B-Instruct

Text Generation • Updated Sep 25, 2024 • 5.97M • • 3.67k

apple/DCLM-7B

Updated Jul 26, 2024 • 713 • 833

Groq/Llama-3-Groq-8B-Tool-Use

Text Generation • Updated Aug 27, 2024 • 1.3k • 273

liked 2 models 8 months ago

google/gemma-2-9b

Text Generation • Updated Aug 7, 2024 • 104k • 648

hubertsiuzdak/snac_24khz

Updated Apr 3, 2024 • 12.1k • 17

liked a model 9 months ago

bitext/Mistral-7B-Customer-Support

Text Generation • Updated Jul 25, 2024 • 355 • 9

liked 2 models 10 months ago

meta-llama/Meta-Llama-3-8B

Text Generation • Updated Sep 27, 2024 • 468k • 6.05k

facebook/wav2vec2-base-960h

Automatic Speech Recognition • Updated Nov 14, 2022 • 3.83M • • 319

liked a model 11 months ago

xai-org/grok-1

Text Generation • Updated Mar 28, 2024 • 1.09k • 2.26k

liked a Space 12 months ago

OutfitAnyone

Generate virtual try-on results for clothing

liked 2 models about 1 year ago

sarvamai/OpenHathi-7B-Hi-v0.1-Base

Text Generation • Updated Dec 22, 2023 • 2.87k • 106

microsoft/phi-2

Text Generation • Updated Apr 29, 2024 • 500k • • 3.28k

liked a Space about 1 year ago

PDF Chatbot

Ask questions about PDF documents

liked a model about 1 year ago

mistralai/Mistral-7B-v0.1

Text Generation • Updated Jul 24, 2024 • 355k • 3.6k