czczup commited on
Commit
205b1ea
1 Parent(s): 0685073

Upload textnet models

Browse files
Files changed (4) hide show
  1. README.md +56 -0
  2. config.json +236 -0
  3. model.safetensors +3 -0
  4. preprocessor_config.json +28 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ ---
4
+ ## TextNet-T/S/B: Efficient Text Detection Models
5
+
6
+ ### **Overview**
7
+ TextNet is a lightweight and efficient architecture designed specifically for text detection, offering superior performance compared to traditional models like MobileNetV3. With variants **TextNet-T**, **TextNet-S**, and **TextNet-B** (6.8M, 8.0M, and 8.9M parameters respectively), it achieves an excellent balance between accuracy and inference speed.
8
+
9
+ ### **Performance**
10
+ TextNet achieves state-of-the-art results in text detection, outperforming hand-crafted models in both accuracy and speed. Its architecture is highly efficient, making it ideal for GPU-based applications.
11
+
12
+ ### How to use
13
+ ### Transformers
14
+ ```bash
15
+ pip install transformers
16
+ ```
17
+
18
+ ```python
19
+ import torch
20
+ import requests
21
+ from PIL import Image
22
+ from transformers import AutoImageProcessor, AutoBackbone
23
+
24
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
25
+ image = Image.open(requests.get(url, stream=True).raw)
26
+
27
+ processor = AutoImageProcessor.from_pretrained("jadechoghari/textnet-base")
28
+ model = AutoBackbone.from_pretrained("jadechoghari/textnet-base")
29
+
30
+ inputs = processor(image, return_tensors="pt")
31
+ with torch.no_grad():
32
+ outputs = model(**inputs)
33
+ ```
34
+ ### **Training**
35
+ We first compare TextNet with representative hand-crafted backbones,
36
+ such as ResNets and VGG16. For a fair comparison,
37
+ all models are first pre-trained on IC17-MLT [52] and then
38
+ finetuned on Total-Text. The proposed
39
+ TextNet models achieve a better trade-off between accuracy
40
+ and inference speed than previous hand-crafted models by a
41
+ significant margin. In addition, notably, our TextNet-T, -S, and
42
+ -B only have 6.8M, 8.0M, and 8.9M parameters respectively,
43
+ which are more parameter-efficient than ResNets and VGG16.
44
+ These results demonstrate that TextNet models are effective for
45
+ text detection on the GPU device.
46
+
47
+ ### **Applications**
48
+ Perfect for real-world text detection tasks, including:
49
+ - Natural scene text recognition
50
+ - Multi-lingual and multi-oriented text detection
51
+ - Document text region analysis
52
+
53
+ ### **Contribution**
54
+ This model was contributed by [Raghavan](https://huggingface.co/Raghavan),
55
+ [jadechoghari](https://huggingface.co/jadechoghari)
56
+ and [nielsr](https://huggingface.co/nielsr).
config.json ADDED
@@ -0,0 +1,236 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "TextNetBackbone"
4
+ ],
5
+ "batch_norm_eps": 1e-05,
6
+ "conv_layer_kernel_sizes": [
7
+ [
8
+ [
9
+ 3,
10
+ 3
11
+ ],
12
+ [
13
+ 3,
14
+ 3
15
+ ],
16
+ [
17
+ 3,
18
+ 1
19
+ ],
20
+ [
21
+ 3,
22
+ 3
23
+ ],
24
+ [
25
+ 3,
26
+ 1
27
+ ],
28
+ [
29
+ 3,
30
+ 3
31
+ ],
32
+ [
33
+ 3,
34
+ 3
35
+ ],
36
+ [
37
+ 1,
38
+ 3
39
+ ],
40
+ [
41
+ 3,
42
+ 3
43
+ ],
44
+ [
45
+ 3,
46
+ 3
47
+ ]
48
+ ],
49
+ [
50
+ [
51
+ 3,
52
+ 3
53
+ ],
54
+ [
55
+ 1,
56
+ 3
57
+ ],
58
+ [
59
+ 3,
60
+ 3
61
+ ],
62
+ [
63
+ 3,
64
+ 1
65
+ ],
66
+ [
67
+ 3,
68
+ 3
69
+ ],
70
+ [
71
+ 3,
72
+ 3
73
+ ],
74
+ [
75
+ 3,
76
+ 1
77
+ ],
78
+ [
79
+ 3,
80
+ 1
81
+ ],
82
+ [
83
+ 3,
84
+ 3
85
+ ],
86
+ [
87
+ 3,
88
+ 3
89
+ ]
90
+ ],
91
+ [
92
+ [
93
+ 3,
94
+ 3
95
+ ],
96
+ [
97
+ 3,
98
+ 3
99
+ ],
100
+ [
101
+ 3,
102
+ 3
103
+ ],
104
+ [
105
+ 1,
106
+ 3
107
+ ],
108
+ [
109
+ 3,
110
+ 3
111
+ ],
112
+ [
113
+ 3,
114
+ 1
115
+ ],
116
+ [
117
+ 3,
118
+ 3
119
+ ],
120
+ [
121
+ 3,
122
+ 1
123
+ ]
124
+ ],
125
+ [
126
+ [
127
+ 3,
128
+ 3
129
+ ],
130
+ [
131
+ 1,
132
+ 3
133
+ ],
134
+ [
135
+ 3,
136
+ 1
137
+ ],
138
+ [
139
+ 3,
140
+ 1
141
+ ],
142
+ [
143
+ 1,
144
+ 3
145
+ ]
146
+ ]
147
+ ],
148
+ "conv_layer_strides": [
149
+ [
150
+ 1,
151
+ 2,
152
+ 1,
153
+ 1,
154
+ 1,
155
+ 1,
156
+ 1,
157
+ 1,
158
+ 1,
159
+ 1
160
+ ],
161
+ [
162
+ 2,
163
+ 1,
164
+ 1,
165
+ 1,
166
+ 1,
167
+ 1,
168
+ 1,
169
+ 1,
170
+ 1,
171
+ 1
172
+ ],
173
+ [
174
+ 2,
175
+ 1,
176
+ 1,
177
+ 1,
178
+ 1,
179
+ 1,
180
+ 1,
181
+ 1
182
+ ],
183
+ [
184
+ 2,
185
+ 1,
186
+ 1,
187
+ 1,
188
+ 1
189
+ ]
190
+ ],
191
+ "depths": [
192
+ 10,
193
+ 10,
194
+ 8,
195
+ 5
196
+ ],
197
+ "hidden_sizes": [
198
+ 64,
199
+ 64,
200
+ 128,
201
+ 256,
202
+ 512
203
+ ],
204
+ "image_size": [
205
+ 640,
206
+ 640
207
+ ],
208
+ "initializer_range": 0.02,
209
+ "model_type": "textnet",
210
+ "out_features": [
211
+ "stage1",
212
+ "stage2",
213
+ "stage3",
214
+ "stage4"
215
+ ],
216
+ "out_indices": [
217
+ 1,
218
+ 2,
219
+ 3,
220
+ 4
221
+ ],
222
+ "stage_names": [
223
+ "stem",
224
+ "stage1",
225
+ "stage2",
226
+ "stage3",
227
+ "stage4"
228
+ ],
229
+ "stem_act_func": "relu",
230
+ "stem_kernel_size": 3,
231
+ "stem_num_channels": 3,
232
+ "stem_out_channels": 64,
233
+ "stem_stride": 2,
234
+ "torch_dtype": "float32",
235
+ "transformers_version": "4.48.0.dev0"
236
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d414e7a89a7709dbc14de450ad52dadc9796ff40b9b74540066132a4410fe724
3
+ size 54291592
preprocessor_config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": {
3
+ "height": 224,
4
+ "width": 224
5
+ },
6
+ "do_center_crop": false,
7
+ "do_convert_rgb": true,
8
+ "do_normalize": true,
9
+ "do_rescale": true,
10
+ "do_resize": true,
11
+ "image_mean": [
12
+ 0.485,
13
+ 0.456,
14
+ 0.406
15
+ ],
16
+ "image_processor_type": "TextNetImageProcessor",
17
+ "image_std": [
18
+ 0.229,
19
+ 0.224,
20
+ 0.225
21
+ ],
22
+ "resample": 2,
23
+ "rescale_factor": 0.00392156862745098,
24
+ "size": {
25
+ "shortest_edge": 640
26
+ },
27
+ "size_divisor": 32
28
+ }