Fevzi KILAS

NIEXCHE

AI & ML interests

Jr. ML Engineer & Data Scientist - Freelance Mobile, Web App & Game Developer Currently working on Time Series - LLM's - VLM'S https://niexche.github.io/ https://fevzikilas.github.io/

Recent Activity

updated a collection 2 days ago
''LA👀''
liked a model 2 days ago
thuml/timer-base-84m
updated a model 6 days ago
NIEXCHE/chronos-t5-small-fine-tuned-v1
View all activity

Organizations

CVPR Demo Track's profile picture ONNXConfig for all's profile picture Gradio-Blocks-Party's profile picture Keras Dreambooth Event's profile picture Blog-explorers's profile picture MLX Community's profile picture

NIEXCHE's activity

replied to clem's post 21 days ago
view reply

Here is my predictions for AI in 2025.🤗🤗🤗

  • A major cyberattack, fueled by AI-generated tactics and automated systems, will lead to a breach of a major corporation or government entity, sparking a global reevaluation of AI security protocols. In addition, there will be major protests.

  • Many people will start using AI-driven mental health tools, such as personalized therapy chatbots and mood-tracking apps, as part of their daily routine.

  • A large coalition of company will propose an international AI regulatory framework that focuses on ethics, accountability, and safety in AI development and deployment across industries.

  • Major social media platforms will adopt AI for full-scale content moderation, reducing human involvement in decision-making for hate speech, fake news, and harmful content . However, the majority of content on these platforms will be generated by AI or AI-assisted tools, raising new challenges around authenticity and accountability.

  • A revolutionary AI tutoring system will emerge.

  • Hugging Face will experience a large-scale social media backlash due to controversial actions or statements by some of its employees.

  • Lots of AI-generated movie will be released.

reacted to singhsidhukuldeep's post with 🤗 22 days ago
view post
Post
1303
Exciting breakthrough in Document AI! Researchers from UNC Chapel Hill and Bloomberg have developed M3DocRAG, a revolutionary framework for multi-modal document understanding.

The innovation lies in its ability to handle complex document scenarios that traditional systems struggle with:
- Process 40,000+ pages across 3,000+ documents
- Answer questions requiring information from multiple pages
- Understand visual elements like charts, tables, and figures
- Support both closed-domain (single document) and open-domain (multiple documents) queries

Under the hood, M3DocRAG operates through three sophisticated stages:

>> Document Embedding:
- Converts PDF pages to RGB images
- Uses ColPali to project both text queries and page images into a shared embedding space
- Creates dense visual embeddings for each page while maintaining visual information integrity

>> Page Retrieval:
- Employs MaxSim scoring to compute relevance between queries and pages
- Implements inverted file indexing (IVFFlat) for efficient search
- Reduces retrieval latency from 20s to under 2s when searching 40K+ pages
- Supports approximate nearest neighbor search via Faiss

>> Question Answering:
- Leverages Qwen2-VL 7B as the multi-modal language model
- Processes retrieved pages through a visual encoder
- Generates answers considering both textual and visual context

The results are impressive:
- State-of-the-art performance on MP-DocVQA benchmark
- Superior handling of non-text evidence compared to text-only systems
- Significantly better performance on multi-hop reasoning tasks

This is a game-changer for industries dealing with large document volumes—finance, healthcare, and legal sectors can now process documents more efficiently while preserving crucial visual context.
·
replied to singhsidhukuldeep's post 22 days ago