IntelLabs
/

LlavaOLMoBitnet1B

Model card Files Files and versions Community

naveensp commited on Aug 23, 2024

Commit

2860fcc

·

verified ·

1 Parent(s): 20b2922

Update README.md

Files changed (1) hide show

README.md +2 -17

README.md CHANGED Viewed

@@ -5,24 +5,9 @@ license: apache-2.0
 # Model Card: LlavaOLMoBitnet1B
-Multimodal Large Language Models (MM-LLMs) have seen significant advancements in the last year, demonstrating impressive performance across tasks. However, to truly democratize AI, models must exhibit strong capabilities and be able to run efficiently on small compute footprints accessible by most. Part of this quest, we introduce LLaVaOLMoBitnet1B - the first Ternary Multimodal LLM capable of accepting Image(s)+Text inputs to produce coherent textual responses. The model is open-sourced along with weights and training scripts to encourage future research into ternary models. We also release a technical report highlighting the training proecss, challenges associated with ternary models and future oppurtunities.
-## Paper Abstract
-## Model Details
-TODO: OPTIONAL - Any notes or warnings about the dataset
-### Note
-Please note, we only provide the model adapter and do not provide a copy of the base [yahma/llama-7b-hf](https://huggingface.co/yahma/llama-7b-hf) model or its sparsified one. Any use of this adapter requires a separate download of the base model and follow [this instruction](#sparsified-base-model) to sparse the base model.
-### Information
-- **Adapter name:** TODO
-- **Base model:** TODO
-- **Sparsity:** TODO
-- **Domain:** TODO
-- **Subnetwork version:** TODO
-TODO - Add any additional info as needed
 ### Training Data

 # Model Card: LlavaOLMoBitnet1B
+Multimodal Large Language Models (MM-LLMs) have seen significant advancements in the last year, demonstrating impressive performance across tasks. However, to truly democratize AI, models must exhibit strong capabilities and be able to run efficiently on small compute footprints accessible by most. Part of this quest, we introduce LLaVaOLMoBitnet1B - the first Ternary Multimodal LLM capable of accepting Image(s)+Text inputs to produce coherent textual responses. The model is fully open-sourced along with training scripts to encourage further research in this space. We also release a technical report highlighting the training proecss, eval details, challenges associated with ternary models and future oppurtunities.
+Authors: Jainaveen Sundaram, Ravishankar Iyer
 ### Training Data