File size: 2,107 Bytes
4225c96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1f311bb
 
 
 
 
 
4225c96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1f311bb
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
datasets:
- chart-misinformation-detection/MISCHA-QA-v1
language:
- en
tags:
- LLaVA
- misinformation
---
# Model Card for Snoopy 1.0

This model aims to detect visual manipulation in bar charts.


## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** Arif Syraj
- **Model type:** Multi-Modal LLM
- **Finetuned from model:** llava-1.6-mistral-7b

## How to Get Started with the Model

This is not a HuggingFace-based model, please refer to this 
[Colab notebook](https://colab.research.google.com/drive/1UpnztYv46faXj-kmFpL_GAbOCjP2u6zM?usp=sharing) 
to run inference. Only works on GPU.

## Training Details

Finetuned with LoRA for 1 epoch on ~2700 images of misleading and non misleading bar charts

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
learning_rate = 1e-5
bf16 = True
num_train_epochs = 1
optim = "adamw_torch"
per_device_train_batch_size = 3
gradient_accumulation_steps = 16
gradient_checkpointing = True

LoRA config:
rank = 32,
lora_alpha = 32,
Using rank stabilized lora
target_modules=[q_proj, out_proj, v_proj, k_proj, down_proj, up_proj, o_proj, gate_proj]
lora_dropout=0.05,
bias="none"


#### Training Hyperparameters

- **Training regime:** bf16 non-mixed precision


## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

- Liu, Haotian, Li, Chunyuan, Li, Yuheng, Li, Bo, Zhang, Yuanhan, Shen, Sheng, & Lee, Yong Jae. (2024, January). **LLaVA-NeXT: Improved reasoning, OCR, and world knowledge**. Retrieved from [https://llava-vl.github.io/blog/2024-01-30-llava-next/](https://llava-vl.github.io/blog/2024-01-30-llava-next/).

- Liu, Haotian, Li, Chunyuan, Li, Yuheng, & Lee, Yong Jae. (2023). **Improved Baselines with Visual Instruction Tuning**. *arXiv:2310.03744*.

- Liu, Haotian, Li, Chunyuan, Wu, Qingyang, & Lee, Yong Jae. (2023). **Visual Instruction Tuning**. *NeurIPS*.