Fevzi KILAS
AI & ML interests
Recent Activity
Organizations
NIEXCHE's activity
Here is my predictions for AI in 2025.🤗🤗🤗
A major cyberattack, fueled by AI-generated tactics and automated systems, will lead to a breach of a major corporation or government entity, sparking a global reevaluation of AI security protocols. In addition, there will be major protests.
Many people will start using AI-driven mental health tools, such as personalized therapy chatbots and mood-tracking apps, as part of their daily routine.
A large coalition of company will propose an international AI regulatory framework that focuses on ethics, accountability, and safety in AI development and deployment across industries.
Major social media platforms will adopt AI for full-scale content moderation, reducing human involvement in decision-making for hate speech, fake news, and harmful content . However, the majority of content on these platforms will be generated by AI or AI-assisted tools, raising new challenges around authenticity and accountability.
A revolutionary AI tutoring system will emerge.
Hugging Face will experience a large-scale social media backlash due to controversial actions or statements by some of its employees.
Lots of AI-generated movie will be released.
[bot] Conversion to Parquet
The innovation lies in its ability to handle complex document scenarios that traditional systems struggle with:
- Process 40,000+ pages across 3,000+ documents
- Answer questions requiring information from multiple pages
- Understand visual elements like charts, tables, and figures
- Support both closed-domain (single document) and open-domain (multiple documents) queries
Under the hood, M3DocRAG operates through three sophisticated stages:
>> Document Embedding:
- Converts PDF pages to RGB images
- Uses ColPali to project both text queries and page images into a shared embedding space
- Creates dense visual embeddings for each page while maintaining visual information integrity
>> Page Retrieval:
- Employs MaxSim scoring to compute relevance between queries and pages
- Implements inverted file indexing (IVFFlat) for efficient search
- Reduces retrieval latency from 20s to under 2s when searching 40K+ pages
- Supports approximate nearest neighbor search via Faiss
>> Question Answering:
- Leverages Qwen2-VL 7B as the multi-modal language model
- Processes retrieved pages through a visual encoder
- Generates answers considering both textual and visual context
The results are impressive:
- State-of-the-art performance on MP-DocVQA benchmark
- Superior handling of non-text evidence compared to text-only systems
- Significantly better performance on multi-hop reasoning tasks
This is a game-changer for industries dealing with large document volumes—finance, healthcare, and legal sectors can now process documents more efficiently while preserving crucial visual context.
Thanks for sharing.
https://m3docrag.github.io/