Image-to-Text
Chinese
English
OpenFace-CQUPT commited on
Commit
a9e2f0f
·
verified ·
1 Parent(s): 649e6ad

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # FLIP (Facial Language Image Pretrain)
2
+
3
+ This repository is the official implementation of [FaceCaption-15M]().
4
+
5
+ **Overview of FLIP architecture.**
6
+
7
+ ![image-20240318101027127](https://img.yutangli.net/img/202403181010116.png)
8
+
9
+ **(a). Same color represents shared parameters. “12x” stands for 12-layer transformer modules. (b), (c) and (d) FLIP-based model are applied to the tasks of text-image retrieval, facial attributes prediction and sketch less facial image retrieval, respectively.**
10
+
11
+ ## Training
12
+
13
+ Coming soon......(Only for the datasets been published, the code of training is meaningful.)
14
+
15
+ ```shell
16
+ python pretrain.py > log.log
17
+ ```
18
+
19
+ ## Evaluation
20
+
21
+ Coming soon......
22
+
23
+ ## Pre-trained Models
24
+
25
+ Coming soon......
26
+
27
+ ## Datasets
28
+
29
+ > **Coming soon......**
30
+
31
+ **Overview of our proposed FaceCaption-15M containing over 15 million facial image-text (right and left) pairs.**
32
+
33
+ ![image-20240318100601414](https://img.yutangli.net/img/202403181006981.png)
34
+
35
+ **Comparisons with other popular facial image datasets.**
36
+
37
+ ![image-20240318100734131](https://img.yutangli.net/img/202403181007778.png)
38
+
39
+ **Image quality score distribution.**
40
+
41
+ ![image-20240318100849106](https://img.yutangli.net/img/202403181008178.png)
42
+
43
+ **Text distribution.**
44
+
45
+ ![image-20240318100913176](https://img.yutangli.net/img/202403181009312.png)
46
+
47
+ ## Results
48
+
49
+ ### Task1: Text-Image Retrieval
50
+
51
+ **Comparison with other classical pretrained models. All pretrained model backbones are frozen, with only the linear layer being fine-tuned. † represents the model pretrained on the LAION-Face [86] dataset; * represents the model pretrained on the FaceCaption dataset constructed without using LLM text generation.**
52
+
53
+ ![](https://img.yutangli.net/img/202403181015142.png)
54
+
55
+ ### Task2: Facial Attributes Prediction
56
+
57
+ **Comparison with other classical models. † represents the model pre-trained on the original LAION-Face dataset.**
58
+
59
+ ![image-20240318101126897](https://img.yutangli.net/img/202403181011115.png)
60
+
61
+ ### Task3: Sketch Less Facial Image Retrieval
62
+
63
+ **Comparative results with different baseline methods. † represents the model pre-trained on the LAION-Face dataset.**
64
+
65
+ ![image-20240318101633671](https://img.yutangli.net/img/202403181016876.png)
66
+
67
+ **Performance of early retrieval in SLFIR problem. Instead of showing the complete sketch, we visualized it using the percentage of sketch. A higher value indicates a better early retrieval performance.**
68
+
69
+ ![image-20240318101704679](https://img.yutangli.net/img/202403181017013.png)
70
+
71
+ ## Citations & Contacts
72
+
73
+ > Coming soon......