watashiha
/

Watashiha-Llama-2-13B-Ogiri-sft-vlm

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Watashiha-Llama-2-13B-Ogiri-sft-vlm / README.md

watashihakobashi's picture

watashihakobashi

Update README.md

59daeef verified 11 months ago

|

1.87 kB

	---
	license: llama2
	language:
	- ja
	---

	## モデル概要
	[Watashiha-Llama-2-13B-Ogiri-sft](https://huggingface.co/watashiha/Watashiha-Llama-2-13B-Ogiri-sft)を[LLaVA](https://github.com/haotian-liu/LLaVA)で学習し、画像に対応した大喜利言語モデルです。
	Vision Encoderには[laion/CLIP-ViT-B-32-laion2B-s34B-b79K](https://huggingface.co/laion/CLIP-ViT-B-32-laion2B-s34B-b79K)を使用しています。

	* License: [LLAMA 2 COMMUNITY LICENSE](https://github.com/facebookresearch/llama/blob/main/LICENSE)
	* Library: [LLaVA](https://github.com/haotian-liu/LLaVA)

	## 学習データ
	事前学習のデータには[STAIR Captions](https://github.com/STAIR-Lab-CIT/STAIR-captions)を使用しています。
	[STAIR Captions](https://github.com/STAIR-Lab-CIT/STAIR-captions)のデータで学習する際、
	[MS COCO 2014](https://cocodataset.org/#home)で以下のライセンスが付与されている画像データは使用しないようにしました。

	- [Attribution-NonCommercial-ShareAlike License](http://creativecommons.org/licenses/by-nc-sa/2.0/)
	- [Attribution-NonCommercial License](http://creativecommons.org/licenses/by-nc/2.0/)
	- [Attribution-NonCommercial-NoDerivs License](http://creativecommons.org/licenses/by-nc-nd/2.0/)
	- [No known copyright restrictions](http://flickr.com/commons/usage/)

	Fine-tuningのデータには以下のデータを使用しています。
	- [Japanese Visual Genome VQA dataset](https://github.com/yahoojapan/ja-vg-vqa)
	- [ボケ缶データセット](https://github.com/aws-samples/bokete-denshosen)
	- 大喜利データ(テキストのみ)

	## 使用方法
	以下のGoogle Colabのサンプルコードを参考にしてください。
	[サンプルコード](https://colab.research.google.com/drive/1aAReEzLHTLnt1DmirQgGw7oGEF6XxwqN?usp=sharing)


	## 開発者
	- 内田達弥 (UCHIDA, Tatsuya)