speed commited on
Commit
46140e3
·
verified ·
1 Parent(s): 222d1e6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -9
README.md CHANGED
@@ -8,7 +8,7 @@ pipeline_tag: zero-shot-image-classification
8
  license:
9
  - apache-2.0
10
  datasets:
11
- - laion/relaion2B-en-research-safe
12
  language:
13
  - ja
14
  ---
@@ -16,7 +16,7 @@ language:
16
 
17
  # Model Details
18
 
19
- A CLIP ViT-L/14 model trained using [OpenCLIP](https://github.com/mlfoundations/open_clip) with the Japanese translation of the English subset of ReLAION-5B (https://huggingface.co/datasets/laion/relaion2B-en-research-safe), translated by [gemma-2-9b-it](https://huggingface.co/datasets/laion/relaion2B-en-research-safe).
20
 
21
  The total number of parameters of this model is 467M.
22
 
@@ -32,8 +32,8 @@ $ pip install open_clip_torch
32
  ```python
33
  import open_clip
34
 
35
- model, preprocess = open_clip.create_model_from_pretrained('hf-hub:speed/llm-jp-roberta-ViT-L-14-relaion-1.5B-lr5e-4-bs8k-accum4-20241218-epoch90')
36
- tokenizer = open_clip.get_tokenizer('hf-hub:speed/llm-jp-roberta-ViT-L-14-relaion-1.5B-lr5e-4-bs8k-accum4-20241218-epoch90')
37
 
38
  import torch
39
  from PIL import Image
@@ -70,24 +70,47 @@ Reference:
70
 
71
  ## Training Data
72
 
73
- We used a Japanese-translated version of the relaion2B-en-research-safe dataset.
74
- The translation was performed using gemma-2-9b-it.
75
  Due to a 70% success rate in image downloads, the dataset size was 1.45 billion samples, and we processed it over 9 epochs (13 billion samples in total).
76
 
77
  # Evaluation
78
 
79
  Evaluation Code: https://github.com/llm-jp/clip-eval
80
 
81
- TODO:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
  # LICENSE
84
  [The Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
85
 
86
- Please also see Gemma Terms of Use (https://ai.google.dev/gemma/terms) as the training data is translated by [gemma-2-9b-it](https://huggingface.co/datasets/laion/relaion2B-en-research-safe).
 
87
 
88
  # Citation
89
 
90
  Bibtex:
91
  ```
92
- TODO:
 
 
 
 
 
 
 
93
  ```
 
8
  license:
9
  - apache-2.0
10
  datasets:
11
+ - llm-jp/relaion2B-en-research-safe-japanese-translation
12
  language:
13
  - ja
14
  ---
 
16
 
17
  # Model Details
18
 
19
+ Japanese CLIP model trained with [OpenCLIP](https://github.com/mlfoundations/open_clip) on [relaion2B-en-research-safe-japanese-translation](https://huggingface.co/datasets/llm-jp/relaion2B-en-research-safe-japanese-translation), a Japanese translation of the English subset of ReLAION-5B (https://huggingface.co/datasets/laion/relaion2B-en-research-safe), translated by [gemma-2-9b-it](https://huggingface.co/datasets/laion/relaion2B-en-research-safe).
20
 
21
  The total number of parameters of this model is 467M.
22
 
 
32
  ```python
33
  import open_clip
34
 
35
+ model, preprocess = open_clip.create_model_from_pretrained('hf-hub:llm-jp/llm-jp-clip-vit-large-patch14')
36
+ tokenizer = open_clip.get_tokenizer('hf-hub:llm-jp/llm-jp-clip-vit-large-patch14')
37
 
38
  import torch
39
  from PIL import Image
 
70
 
71
  ## Training Data
72
 
73
+ This model is trained on [relaion2B-en-research-safe-japanese-translation](https://huggingface.co/datasets/llm-jp/relaion2B-en-research-safe-japanese-translation).
 
74
  Due to a 70% success rate in image downloads, the dataset size was 1.45 billion samples, and we processed it over 9 epochs (13 billion samples in total).
75
 
76
  # Evaluation
77
 
78
  Evaluation Code: https://github.com/llm-jp/clip-eval
79
 
80
+ **Table:** Performance of each model in zero-shot image classification and image-text retrieval tasks. **Bold** indicates first place, and _underline_ indicates second place.
81
+
82
+
83
+ | Model | Params (M) | ImageNet | Recruit | CIFAR10 | CIFAR100 | Food101 | Caltech101 | XM3600 I → T | XM3600 T → I | Avg. |
84
+ |-----------------------------|-------------|----------|---------|---------|----------|---------|------------|-------------|-------------|------|
85
+ | **Japanese CLIP** | | | | | | | | | | |
86
+ | [Rinna ViT-B/16](https://huggingface.co/rinna/japanese-clip-vit-b-16) | 196 | 50.6 | 39.9 | 90.7 | 64.0 | 53.2 | 84.6 | 53.8 | 54.0 | 61.4 |
87
+ | [Rinna ViT-B/16 cloob](https://huggingface.co/rinna/japanese-cloob-vit-b-16) | 196 | 54.6 | 41.6 | 88.2 | 60.3 | 57.2 | 80.2 | 53.4 | 53.4 | 61.1 |
88
+ | [LY ViT-B/16](https://huggingface.co/line-corporation/clip-japanese-base) | 196 | 52.0 | **83.8** | 96.3 | 76.7 | 73.9 | **88.4** | **76.9** | **78.0** | **78.3** |
89
+ | [**llm-jp-ViT-B/16**](https://huggingface.co/llm-jp/llm-jp-clip-vit-base-patch16) | 248 | 54.2 | 59.4 | 91.8 | 69.2 | _82.2_ | 85.6 | 73.6 | 72.7 | 73.6 |
90
+ | [StabilityAI ViT-L/16](https://huggingface.co/stabilityai/japanese-stable-clip-vit-l-16) | 414 | **62.4** | 70.5 | _97.6_ | **84.1** | 74.0 | 86.7 | 67.3 | 66.0 | 76.1 |
91
+ | [**llm-jp-ViT-L/14**](https://huggingface.co/llm-jp/llm-jp-clip-vit-large-patch14) | 467 | _59.5_ | 62.9 | 96.4 | 77.0 | **88.2** | _87.8_ | 74.1 | _74.1_ | _77.5_ |
92
+ | **Multilingual CLIP** | | | | | | | | | | |
93
+ | [SigLIP B/16-256 multi](https://huggingface.co/google/siglip-base-patch16-256-multilingual) | 370 | 51.9 | 71.2 | 92.4 | 65.8 | 78.6 | 85.6 | 45.9 | 43.0 | 66.8 |
94
+ | [jina-clip-v2](https://huggingface.co/jinaai/jina-clip-v2) | 865 | 35.8 | 48.1 | 95.1 | 58.3 | 52.0 | 69.4 | 67.3 | 66.4 | 61.6 |
95
+ | [LAION ViT-H/14 multi](https://huggingface.co/laion/CLIP-ViT-H-14-frozen-xlm-roberta-large-laion5B-s13B-b90k) | 1193 | 53.0 | _74.5_ | **97.9** | _78.4_ | 74.3 | 85.1 | _75.0_ | 72.0 | 76.3 |
96
+
97
 
98
  # LICENSE
99
  [The Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
100
 
101
+
102
+ Please refer to the [Gemma Terms of Use](https://ai.google.dev/gemma/terms), as the training data was translated using gemma-2-9b-it. We utilizes Gemma solely for translation purposes. According to the definition of "Model Derivatives" in Section 1.1(e), our model does not fall under the category of a "model in order to cause that model to perform similarly to Gemma." Therefore, we have concluded that it is not necessary to inherit the Gemma license.
103
 
104
  # Citation
105
 
106
  Bibtex:
107
  ```
108
+ @inproceedings{sugiura2025clip,
109
+ author = {杉浦 一瑳 and 栗田 修平 and 小田 悠介 and 河原大輔 and 岡崎 直観},
110
+ month = mar,
111
+ series = {言語処理学会第31回年次大会 (NLP2025)},
112
+ title = {オープンLLMによる翻訳を活用した日本語 CLIP の開発},
113
+ year = {2025}
114
+ }
115
+
116
  ```