--- datasets: - chart-misinformation-detection/MISCHA-QA-v1 language: - en tags: - LLaVA - misinformation --- # Model Card for Snoopy 1.0 This model aims to detect visual manipulation in bar charts. ## Model Details ### Model Description - **Developed by:** Arif Syraj - **Model type:** Multi-Modal LLM - **Finetuned from model:** llava-1.6-mistral-7b ## How to Get Started with the Model This is not a HuggingFace-based model, please refer to this [Colab notebook](https://colab.research.google.com/drive/1UpnztYv46faXj-kmFpL_GAbOCjP2u6zM?usp=sharing) to run inference. Only works on GPU. ## Training Details Finetuned with LoRA for 1 epoch on ~2700 images of misleading and non misleading bar charts ### Training Procedure learning_rate = 1e-5 bf16 = True num_train_epochs = 1 optim = "adamw_torch" per_device_train_batch_size = 3 gradient_accumulation_steps = 16 gradient_checkpointing = True LoRA config: rank = 32, lora_alpha = 32, Using rank stabilized lora target_modules=[q_proj, out_proj, v_proj, k_proj, down_proj, up_proj, o_proj, gate_proj] lora_dropout=0.05, bias="none" #### Training Hyperparameters - **Training regime:** bf16 non-mixed precision ## Citation **BibTeX:** - Liu, Haotian, Li, Chunyuan, Li, Yuheng, Li, Bo, Zhang, Yuanhan, Shen, Sheng, & Lee, Yong Jae. (2024, January). **LLaVA-NeXT: Improved reasoning, OCR, and world knowledge**. Retrieved from [https://llava-vl.github.io/blog/2024-01-30-llava-next/](https://llava-vl.github.io/blog/2024-01-30-llava-next/). - Liu, Haotian, Li, Chunyuan, Li, Yuheng, & Lee, Yong Jae. (2023). **Improved Baselines with Visual Instruction Tuning**. *arXiv:2310.03744*. - Liu, Haotian, Li, Chunyuan, Wu, Qingyang, & Lee, Yong Jae. (2023). **Visual Instruction Tuning**. *NeurIPS*.