|
--- |
|
datasets: |
|
- chart-misinformation-detection/MISCHA-QA-v1 |
|
language: |
|
- en |
|
tags: |
|
- LLaVA |
|
- misinformation |
|
--- |
|
# Model Card for Snoopy 1.0 |
|
|
|
This model aims to detect visual manipulation in bar charts. |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Developed by:** Arif Syraj |
|
- **Model type:** Multi-Modal LLM |
|
- **Finetuned from model:** llava-1.6-mistral-7b |
|
|
|
## How to Get Started with the Model |
|
|
|
This is not a HuggingFace-based model, please refer to this |
|
[Colab notebook](https://colab.research.google.com/drive/1UpnztYv46faXj-kmFpL_GAbOCjP2u6zM?usp=sharing) |
|
to run inference. Only works on GPU. |
|
|
|
## Training Details |
|
|
|
Finetuned with LoRA for 1 epoch on ~2700 images of misleading and non misleading bar charts |
|
|
|
### Training Procedure |
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
learning_rate = 1e-5 |
|
bf16 = True |
|
num_train_epochs = 1 |
|
optim = "adamw_torch" |
|
per_device_train_batch_size = 3 |
|
gradient_accumulation_steps = 16 |
|
gradient_checkpointing = True |
|
|
|
LoRA config: |
|
rank = 32, |
|
lora_alpha = 32, |
|
Using rank stabilized lora |
|
target_modules=[q_proj, out_proj, v_proj, k_proj, down_proj, up_proj, o_proj, gate_proj] |
|
lora_dropout=0.05, |
|
bias="none" |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** bf16 non-mixed precision |
|
|
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
**BibTeX:** |
|
|
|
- Liu, Haotian, Li, Chunyuan, Li, Yuheng, Li, Bo, Zhang, Yuanhan, Shen, Sheng, & Lee, Yong Jae. (2024, January). **LLaVA-NeXT: Improved reasoning, OCR, and world knowledge**. Retrieved from [https://llava-vl.github.io/blog/2024-01-30-llava-next/](https://llava-vl.github.io/blog/2024-01-30-llava-next/). |
|
|
|
- Liu, Haotian, Li, Chunyuan, Li, Yuheng, & Lee, Yong Jae. (2023). **Improved Baselines with Visual Instruction Tuning**. *arXiv:2310.03744*. |
|
|
|
- Liu, Haotian, Li, Chunyuan, Wu, Qingyang, & Lee, Yong Jae. (2023). **Visual Instruction Tuning**. *NeurIPS*. |