English
naveensp commited on
Commit
2860fcc
·
verified ·
1 Parent(s): 20b2922

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -17
README.md CHANGED
@@ -5,24 +5,9 @@ license: apache-2.0
5
 
6
  # Model Card: LlavaOLMoBitnet1B
7
 
8
- Multimodal Large Language Models (MM-LLMs) have seen significant advancements in the last year, demonstrating impressive performance across tasks. However, to truly democratize AI, models must exhibit strong capabilities and be able to run efficiently on small compute footprints accessible by most. Part of this quest, we introduce LLaVaOLMoBitnet1B - the first Ternary Multimodal LLM capable of accepting Image(s)+Text inputs to produce coherent textual responses. The model is open-sourced along with weights and training scripts to encourage future research into ternary models. We also release a technical report highlighting the training proecss, challenges associated with ternary models and future oppurtunities.
9
 
10
- ## Paper Abstract
11
-
12
- ## Model Details
13
-
14
- TODO: OPTIONAL - Any notes or warnings about the dataset
15
- ### Note
16
- Please note, we only provide the model adapter and do not provide a copy of the base [yahma/llama-7b-hf](https://huggingface.co/yahma/llama-7b-hf) model or its sparsified one. Any use of this adapter requires a separate download of the base model and follow [this instruction](#sparsified-base-model) to sparse the base model.
17
-
18
- ### Information
19
-
20
- - **Adapter name:** TODO
21
- - **Base model:** TODO
22
- - **Sparsity:** TODO
23
- - **Domain:** TODO
24
- - **Subnetwork version:** TODO
25
- TODO - Add any additional info as needed
26
 
27
 
28
  ### Training Data
 
5
 
6
  # Model Card: LlavaOLMoBitnet1B
7
 
8
+ Multimodal Large Language Models (MM-LLMs) have seen significant advancements in the last year, demonstrating impressive performance across tasks. However, to truly democratize AI, models must exhibit strong capabilities and be able to run efficiently on small compute footprints accessible by most. Part of this quest, we introduce LLaVaOLMoBitnet1B - the first Ternary Multimodal LLM capable of accepting Image(s)+Text inputs to produce coherent textual responses. The model is fully open-sourced along with training scripts to encourage further research in this space. We also release a technical report highlighting the training proecss, eval details, challenges associated with ternary models and future oppurtunities.
9
 
10
+ Authors: Jainaveen Sundaram, Ravishankar Iyer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
 
13
  ### Training Data