Update README.md
Browse files
README.md
CHANGED
@@ -14,8 +14,7 @@ Model card for <a href="https://recognize-anything.github.io/">Recognize Anythin
|
|
14 |
**Recognition and localization are two foundation computer vision tasks.**
|
15 |
- **The Segment Anything Model (SAM)** excels in **localization capabilities**, while it falls short when it comes to **recognition tasks**.
|
16 |
- **The Recognize Anything Model (RAM) and Tag2Text** exhibits **exceptional recognition abilities**, in terms of **both accuracy and scope**.
|
17 |
-
|
18 |
-
|
19 |
| ![RAM.jpg](https://github.com/xinyu1205/Tag2Text/raw/main/images/localization_and_recognition.jpg) |
|
20 |
|:--:|
|
21 |
| <b> Pull figure from recognize-anything official repo | Image source: https://recognize-anything.github.io/ </b>|
|
@@ -38,14 +37,10 @@ Authors from the [paper](https://arxiv.org/abs/2306.03514) write in the abstract
|
|
38 |
}
|
39 |
|
40 |
@article{huang2023tag2text,
|
|
|
41 |
title={Tag2Text: Guiding Vision-Language Model via Image Tagging},
|
42 |
author={Huang, Xinyu and Zhang, Youcai and Ma, Jinyu and Tian, Weiwei and Feng, Rui and Zhang, Yuejie and Li, Yaqian and Guo, Yandong and Zhang, Lei},
|
43 |
journal={arXiv preprint arXiv:2303.05657},
|
44 |
year={2023}
|
45 |
}
|
46 |
```
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
|
|
14 |
**Recognition and localization are two foundation computer vision tasks.**
|
15 |
- **The Segment Anything Model (SAM)** excels in **localization capabilities**, while it falls short when it comes to **recognition tasks**.
|
16 |
- **The Recognize Anything Model (RAM) and Tag2Text** exhibits **exceptional recognition abilities**, in terms of **both accuracy and scope**.
|
17 |
+
-
|
|
|
18 |
| ![RAM.jpg](https://github.com/xinyu1205/Tag2Text/raw/main/images/localization_and_recognition.jpg) |
|
19 |
|:--:|
|
20 |
| <b> Pull figure from recognize-anything official repo | Image source: https://recognize-anything.github.io/ </b>|
|
|
|
37 |
}
|
38 |
|
39 |
@article{huang2023tag2text,
|
40 |
+
|
41 |
title={Tag2Text: Guiding Vision-Language Model via Image Tagging},
|
42 |
author={Huang, Xinyu and Zhang, Youcai and Ma, Jinyu and Tian, Weiwei and Feng, Rui and Zhang, Yuejie and Li, Yaqian and Guo, Yandong and Zhang, Lei},
|
43 |
journal={arXiv preprint arXiv:2303.05657},
|
44 |
year={2023}
|
45 |
}
|
46 |
```
|
|
|
|
|
|
|
|
|
|