view post Post 1728 Gemma-3-4B : Image and Video Inference 🖼️🎥🧤Space: prithivMLmods/Gemma-3-Multimodal @gemma3-4b : {Tag + Space_+ 'prompt'} @video-infer : {Tag + Space_+ 'prompt'} + Gemma3-4B : google/gemma-3-4b-it+ By default, it runs : prithivMLmods/Qwen2-VL-OCR-2B-Instruct Gemma 3 Technical Report : https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdfAdditionally, I have also tested Aya-Vision 8B vs Custom Qwen2-VL-OCR for OCR with test case samples on messy handwriting for experimental purposes to optimize edge device VLMs for Optical Character Recognition.📜Read the blog here: https://huggingface.co/blog/prithivMLmods/aya-vision-vs-qwen2vl-ocr-2b See translation 1 reply · 🔥 11 11 🤗 9 9 👍 7 7 ❤️ 6 6 + Reply
view post Post 2714 Variable Demo for Two Image-to-Text-to-Text Multimodals 🌠📜Space: prithivMLmods/Multimodal-OCRBy default, it will use: prithivMLmods/Qwen2-VL-OCR-2B-Instruct or prithivMLmods/Qwen2-VL-OCR2-2B-InstructTo trigger Aya-Vision's 8B by @aya-vision , use the prompt: CohereForAI/aya-vision-8b See translation 🤗 11 11 🤝 9 9 👍 7 7 👀 7 7 + Reply