|
--- |
|
library_name: transformers |
|
--- |
|
## TextNet-T/S/B: Efficient Text Detection Models |
|
|
|
### **Overview** |
|
TextNet is a lightweight and efficient architecture designed specifically for text detection, offering superior performance compared to traditional models like MobileNetV3. With variants **TextNet-T**, **TextNet-S**, and **TextNet-B** (6.8M, 8.0M, and 8.9M parameters respectively), it achieves an excellent balance between accuracy and inference speed. |
|
|
|
### **Performance** |
|
TextNet achieves state-of-the-art results in text detection, outperforming hand-crafted models in both accuracy and speed. Its architecture is highly efficient, making it ideal for GPU-based applications. |
|
|
|
### How to use |
|
### Transformers |
|
```bash |
|
pip install transformers |
|
``` |
|
|
|
```python |
|
import torch |
|
import requests |
|
from PIL import Image |
|
from transformers import AutoImageProcessor, AutoBackbone |
|
|
|
url = "http://images.cocodataset.org/val2017/000000039769.jpg" |
|
image = Image.open(requests.get(url, stream=True).raw) |
|
|
|
processor = AutoImageProcessor.from_pretrained("jadechoghari/textnet-tiny") |
|
model = AutoBackbone.from_pretrained("jadechoghari/textnet-base") |
|
|
|
inputs = processor(image, return_tensors="pt") |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
``` |
|
### **Training** |
|
We first compare TextNet with representative hand-crafted backbones, |
|
such as ResNets and VGG16. For a fair comparison, |
|
all models are first pre-trained on IC17-MLT [52] and then |
|
finetuned on Total-Text. The proposed |
|
TextNet models achieve a better trade-off between accuracy |
|
and inference speed than previous hand-crafted models by a |
|
significant margin. In addition, notably, our TextNet-T, -S, and |
|
-B only have 6.8M, 8.0M, and 8.9M parameters respectively, |
|
which are more parameter-efficient than ResNets and VGG16. |
|
These results demonstrate that TextNet models are effective for |
|
text detection on the GPU device. |
|
|
|
### **Applications** |
|
Perfect for real-world text detection tasks, including: |
|
- Natural scene text recognition |
|
- Multi-lingual and multi-oriented text detection |
|
- Document text region analysis |
|
|
|
### **Contribution** |
|
This model was contributed by [Raghavan](https://huggingface.co/Raghavan), |
|
[jadechoghari](https://huggingface.co/jadechoghari) |
|
and [nielsr](https://huggingface.co/nielsr). |