AnyModal/flickr30k
Viewer
•
Updated
•
31k
•
132
Multimodal LLMs for all! AnyModal is a modular and extensible framework for integrating diverse input modalities (e.g., images, audio) into large language models (LLMs). It enables seamless tokenization, encoding, and language generation using pre-trained models for various modalities.