Ahmed Masry PRO
ahmed-masry
AI & ML interests
Multimodal Chart Understanding,
Multimodal Document AI,
Multimodal Vision - Language Models,
Articles
Organizations
None yet
Posts
3
Post
79
π Introducing ColFlor: An Efficient, OCR-Free Vision-Language Document Retrieval Model π
Earlier this year, ColPali revolutionized document retrieval by eliminating the need for error-prone OCR pipelines. Instead, it directly processes the document images. However, with its 3 billion parameters, ColPali is computationally heavy for large-scale applications.
Thatβs where ColFlor comes inβa smaller, faster alternative! π At 17x smaller than ColPali, ColFlor offers a more efficient, OCR-free document retrieval solution, making it ideal for users with limited computing resources (GPU Poor). π‘
Key Highlights:
π§ 174M parameters (vs. 3B for ColPali)
β‘ 9.8x faster query encoding, 5.25x faster image encoding
π Only 1.8% performance drop on text-rich English documents
Check out the full blog post for more insights on modeling, training, and evaluations across various document retrieval tasks! π
Also, feel free to try our demo on huggingface π€
π Resources:
π Blog post: https://huggingface.co/blog/ahmed-masry/colflor
π§ Model: ahmed-masry/ColFlor
π» Demo: ahmed-masry/ColFlor-Demo
ποΈββοΈ Training code: https://github.com/AhmedMasryKU/colflor
π Evaluation code: https://github.com/AhmedMasryKU/vidore-benchmark-colflor
Earlier this year, ColPali revolutionized document retrieval by eliminating the need for error-prone OCR pipelines. Instead, it directly processes the document images. However, with its 3 billion parameters, ColPali is computationally heavy for large-scale applications.
Thatβs where ColFlor comes inβa smaller, faster alternative! π At 17x smaller than ColPali, ColFlor offers a more efficient, OCR-free document retrieval solution, making it ideal for users with limited computing resources (GPU Poor). π‘
Key Highlights:
π§ 174M parameters (vs. 3B for ColPali)
β‘ 9.8x faster query encoding, 5.25x faster image encoding
π Only 1.8% performance drop on text-rich English documents
Check out the full blog post for more insights on modeling, training, and evaluations across various document retrieval tasks! π
Also, feel free to try our demo on huggingface π€
π Resources:
π Blog post: https://huggingface.co/blog/ahmed-masry/colflor
π§ Model: ahmed-masry/ColFlor
π» Demo: ahmed-masry/ColFlor-Demo
ποΈββοΈ Training code: https://github.com/AhmedMasryKU/colflor
π Evaluation code: https://github.com/AhmedMasryKU/vidore-benchmark-colflor
Post
3098
π’ Exciting News! Our latest paper "ChartGemma" is out! π
π§΅1/3: ChartGemma overcomes existing chart models key limitations that rely too much on data tables. Instead, it is trained on data generated directly from chart images, capturing crucial visual trendsπΈπ
π§΅2/3: ChartGemma builds upon PaliGemma from Google Research and is fine-tuned on a high-quality visual instruction tuning dataset generated from Gemini Flash 1.5. ππ
π§΅3/3: Achieves state-of-the-art results in chart summarization, question answering, and fact-checking tasks. π π It can also generate more accurate and realistic chart summaries. ππ
Our model and data are publicly available. We also have a cool web demo. Check it out! π
Demo: ahmed-masry/ChartGemma
Code: https://github.com/vis-nlp/ChartGemma
Paper: ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild (2407.04172)
π§΅1/3: ChartGemma overcomes existing chart models key limitations that rely too much on data tables. Instead, it is trained on data generated directly from chart images, capturing crucial visual trendsπΈπ
π§΅2/3: ChartGemma builds upon PaliGemma from Google Research and is fine-tuned on a high-quality visual instruction tuning dataset generated from Gemini Flash 1.5. ππ
π§΅3/3: Achieves state-of-the-art results in chart summarization, question answering, and fact-checking tasks. π π It can also generate more accurate and realistic chart summaries. ππ
Our model and data are publicly available. We also have a cool web demo. Check it out! π
Demo: ahmed-masry/ChartGemma
Code: https://github.com/vis-nlp/ChartGemma
Paper: ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild (2407.04172)
spaces
6
Running
on
Zero
1
π
ColFlor Demo
Runtime error
π
ChartInstruct FlanT5 XL
Running
on
Zero
3
π’
ChartInstruct LLama2
ChartInstruct for general purpose chart understanding tasks
Running
on
Zero
π
UniChart Base
UniChart Base (Best for Chart Captioning & Chart to Table)
Running
on
Zero
102
π¨
ChartGemma
Running
π
UniChart ChartQA
UniChart finetuned on the ChartQA dataset
models
10
ahmed-masry/ColFlor
Updated
β’
157
ahmed-masry/chartgemma
Updated
β’
3.21k
β’
28
ahmed-masry/ChartInstruct-FlanT5-XL
Text2Text Generation
β’
Updated
β’
243
ahmed-masry/ChartInstruct-LLama2
Updated
β’
2.14k
β’
1
ahmed-masry/unichart-base-960
Updated
β’
846
β’
2
ahmed-masry/Chart-Mask-RCNN
Updated
ahmed-masry/unichart-opencqa-960
Updated
β’
132k
ahmed-masry/unichart-chart2text-pew-960
Updated
β’
321
ahmed-masry/unichart-chart2text-statista-960
Updated
β’
26
ahmed-masry/unichart-chartqa-960
Updated
β’
793
β’
2
datasets
7
ahmed-masry/unichart-qa-data
Viewer
β’
Updated
β’
300k
β’
3
β’
1
ahmed-masry/unichart-table-data
Viewer
β’
Updated
β’
305k
β’
23
β’
1
ahmed-masry/ChartGemma
Viewer
β’
Updated
β’
163k
β’
949
β’
6
ahmed-masry/ChartQA
Viewer
β’
Updated
β’
32.7k
β’
329
β’
19
ahmed-masry/chartqa_without_images
Viewer
β’
Updated
β’
32.7k
β’
47
ahmed-masry/unichart-pretrain-data
Viewer
β’
Updated
β’
6.9M
β’
3
β’
3
ahmed-masry/UniChart-pretrain-images
Preview
β’
Updated
β’
2