KoLLaVA : Korean Large Language and Vision Assistant (feat. LLaVA)

This model is a large multimodal model (LMM) that combines the LLM(KoVicuna) with visual encoder of CLIP(ViT-14), trained on Korean visual-instruction dataset.

Detail codes are available at KoLLaVA github repository

Training hyperparameters

  • learning rate : 2e-5
  • train_batch_size: 16
  • distributed_type: multi-GPU (A100 80G)
  • num_devices: 4
  • gradient_accumulation_steps: 1
  • total_train_batch_size: 64
  • total_eval_batch_size: 16
  • lr_scheduler_type: cosine
  • num_epochs: 1

Model License: Apache License 2.0

Downloads last month
227
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train tabtoyou/KoLLaVA-KoVicuna-7b