watashiha
/

Watashiha-Llama-2-13B-Ogiri-sft-vlm

Text Generation

Inference Endpoints

Model card Files Files and versions Community

モデル概要

Watashiha-Llama-2-13B-Ogiri-sftをLLaVAで学習し、画像に対応した大喜利言語モデルです。
Vision Encoderにはlaion/CLIP-ViT-B-32-laion2B-s34B-b79Kを使用しています。

License: LLAMA 2 COMMUNITY LICENSE
Library: LLaVA

学習データ

事前学習のデータにはSTAIR Captionsを使用しています。
STAIR Captionsのデータで学習する際、 MS COCO 2014で以下のライセンスが付与されている画像データは使用しないようにしました。

Fine-tuningのデータには以下のデータを使用しています。

Japanese Visual Genome VQA dataset
ボケ缶データセット
大喜利データ(テキストのみ)

使用方法

以下のGoogle Colabのサンプルコードを参考にしてください。
サンプルコード

開発者

内田達弥 (UCHIDA, Tatsuya)

Downloads last month: 4

Safetensors

Model size

13.3B params

Tensor type

BF16

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.