xinyu1205
/

recognize_anything_model

image tagging, image captioning

Model card Files Files and versions Community

xinyu1205 commited on Jun 14, 2023

Commit

a4f945f

·

1 Parent(s): c457cc4

Update README.md

Files changed (1) hide show

README.md +2 -7

README.md CHANGED Viewed

@@ -14,8 +14,7 @@ Model card for <a href="https://recognize-anything.github.io/">Recognize Anythin
 **Recognition and localization are two foundation computer vision tasks.**
 - **The Segment Anything Model (SAM)** excels in **localization capabilities**, while it falls short when it comes to **recognition tasks**.
 - **The Recognize Anything Model (RAM) and Tag2Text** exhibits **exceptional recognition abilities**, in terms of **both accuracy and scope**.
 | ![RAM.jpg](https://github.com/xinyu1205/Tag2Text/raw/main/images/localization_and_recognition.jpg) |
 |:--:|
 | <b> Pull figure from recognize-anything official repo | Image source: https://recognize-anything.github.io/ </b>|
@@ -38,14 +37,10 @@ Authors from the [paper](https://arxiv.org/abs/2306.03514) write in the abstract
 }
 @article{huang2023tag2text,
   title={Tag2Text: Guiding Vision-Language Model via Image Tagging},
   author={Huang, Xinyu and Zhang, Youcai and Ma, Jinyu and Tian, Weiwei and Feng, Rui and Zhang, Yuejie and Li, Yaqian and Guo, Yandong and Zhang, Lei},
   journal={arXiv preprint arXiv:2303.05657},
   year={2023}
 }
 ```

 **Recognition and localization are two foundation computer vision tasks.**
 - **The Segment Anything Model (SAM)** excels in **localization capabilities**, while it falls short when it comes to **recognition tasks**.
 - **The Recognize Anything Model (RAM) and Tag2Text** exhibits **exceptional recognition abilities**, in terms of **both accuracy and scope**.
+-
 | ![RAM.jpg](https://github.com/xinyu1205/Tag2Text/raw/main/images/localization_and_recognition.jpg) |
 |:--:|
 | <b> Pull figure from recognize-anything official repo | Image source: https://recognize-anything.github.io/ </b>|
 }
 @article{huang2023tag2text,
   title={Tag2Text: Guiding Vision-Language Model via Image Tagging},
   author={Huang, Xinyu and Zhang, Youcai and Ma, Jinyu and Tian, Weiwei and Feng, Rui and Zhang, Yuejie and Li, Yaqian and Guo, Yandong and Zhang, Lei},
   journal={arXiv preprint arXiv:2303.05657},
   year={2023}
 }
 ```