badayvedat's picture
feat: Add LLaVA model
a824a18
|
raw
history blame
2.33 kB

ScienceQA

Prepare Data

  1. Please see ScienceQA repo for setting up the dataset.
  2. Generate ScienceQA dataset for LLaVA conversation-style format.
python scripts/convert_sqa_to_llava.py \
    convert_to_llava \
    --base-dir /path/to/ScienceQA/data/scienceqa \
    --prompt-format "QCM-LEA" \
    --split {train,val,minival,test,minitest}

Training

  1. Pretraining

You can download our pretrained projector weights from our Model Zoo, or train your own projector weights using pretrain.sh.

  1. Finetuning

See finetune_sqa.sh.

Evaluation

  1. Multiple-GPU inference You may evaluate this with multiple GPUs, and concatenate the generated jsonl files. Please refer to our script for batch evaluation and results gathering.

  2. Single-GPU inference

(a) Generate LLaVA responses on ScienceQA dataset

python -m llava.eval.model_vqa_science \
    --model-path liuhaotian/llava-lcs558k-scienceqa-vicuna-13b-v1.3 \
    --question-file /path/to/ScienceQA/data/scienceqa/llava_test_QCM-LEA.json \
    --image-folder /path/to/ScienceQA/data/scienceqa/images/test \
    --answers-file vqa/results/ScienceQA/test_llava-13b.jsonl \
    --conv-mode llava_v1

(b) Evaluate the generated responses

python eval_science_qa.py \
    --base-dir /path/to/ScienceQA/data/scienceqa \
    --result-file vqa/results/ScienceQA/test_llava-13b.jsonl \
    --output-file vqa/results/ScienceQA/test_llava-13b_output.json \
    --output-result vqa/results/ScienceQA/test_llava-13b_result.json \

For reference, we attach our prediction file test_sqa_llava_lcs_558k_sqa_12e_vicuna_v1_3_13b.json and test_sqa_llava_13b_v0.json for comparison when reproducing our results, as well as for further analysis in detail.