chart-misinformation-detection
/

llava-1.6-mistral-7b-snoopy-1.0-post-finetune-full-folder

Model card Files Files and versions Community

llava-1.6-mistral-7b-snoopy-1.0-post-finetune-full-folder / README.md

erwannd's picture

Update README.md

1f311bb verified 3 months ago

|

history blame contribute delete

2.11 kB

	---
	datasets:
	- chart-misinformation-detection/MISCHA-QA-v1
	language:
	- en
	tags:
	- LLaVA
	- misinformation
	---
	# Model Card for Snoopy 1.0

	This model aims to detect visual manipulation in bar charts.


	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Developed by: Arif Syraj
	- Model type: Multi-Modal LLM
	- Finetuned from model: llava-1.6-mistral-7b

	## How to Get Started with the Model

	This is not a HuggingFace-based model, please refer to this
	[Colab notebook](https://colab.research.google.com/drive/1UpnztYv46faXj-kmFpL_GAbOCjP2u6zM?usp=sharing)
	to run inference. Only works on GPU.

	## Training Details

	Finetuned with LoRA for 1 epoch on ~2700 images of misleading and non misleading bar charts

	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
	learning_rate = 1e-5
	bf16 = True
	num_train_epochs = 1
	optim = "adamw_torch"
	per_device_train_batch_size = 3
	gradient_accumulation_steps = 16
	gradient_checkpointing = True

	LoRA config:
	rank = 32,
	lora_alpha = 32,
	Using rank stabilized lora
	target_modules=[q_proj, out_proj, v_proj, k_proj, down_proj, up_proj, o_proj, gate_proj]
	lora_dropout=0.05,
	bias="none"


	#### Training Hyperparameters

	- Training regime: bf16 non-mixed precision


	## Citation

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	BibTeX:

	- Liu, Haotian, Li, Chunyuan, Li, Yuheng, Li, Bo, Zhang, Yuanhan, Shen, Sheng, & Lee, Yong Jae. (2024, January). LLaVA-NeXT: Improved reasoning, OCR, and world knowledge. Retrieved from [https://llava-vl.github.io/blog/2024-01-30-llava-next/](https://llava-vl.github.io/blog/2024-01-30-llava-next/).

	- Liu, Haotian, Li, Chunyuan, Li, Yuheng, & Lee, Yong Jae. (2023). Improved Baselines with Visual Instruction Tuning. arXiv:2310.03744.

	- Liu, Haotian, Li, Chunyuan, Wu, Qingyang, & Lee, Yong Jae. (2023). Visual Instruction Tuning. NeurIPS.