czczup commited on
Commit
dbc647f
·
verified ·
1 Parent(s): 97bce9b

Upload textnet models

Browse files
Files changed (4) hide show
  1. README.md +56 -3
  2. config.json +186 -0
  3. model.safetensors +3 -0
  4. preprocessor_config.json +28 -0
README.md CHANGED
@@ -1,3 +1,56 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ ---
4
+ ## TextNet-T/S/B: Efficient Text Detection Models
5
+
6
+ ### **Overview**
7
+ TextNet is a lightweight and efficient architecture designed specifically for text detection, offering superior performance compared to traditional models like MobileNetV3. With variants **TextNet-T**, **TextNet-S**, and **TextNet-B** (6.8M, 8.0M, and 8.9M parameters respectively), it achieves an excellent balance between accuracy and inference speed.
8
+
9
+ ### **Performance**
10
+ TextNet achieves state-of-the-art results in text detection, outperforming hand-crafted models in both accuracy and speed. Its architecture is highly efficient, making it ideal for GPU-based applications.
11
+
12
+ ### How to use
13
+ ### Transformers
14
+ ```bash
15
+ pip install transformers
16
+ ```
17
+
18
+ ```python
19
+ import torch
20
+ import requests
21
+ from PIL import Image
22
+ from transformers import AutoImageProcessor, AutoBackbone
23
+
24
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
25
+ image = Image.open(requests.get(url, stream=True).raw)
26
+
27
+ processor = AutoImageProcessor.from_pretrained("jadechoghari/textnet-base")
28
+ model = AutoBackbone.from_pretrained("jadechoghari/textnet-tiny")
29
+
30
+ inputs = processor(image, return_tensors="pt")
31
+ with torch.no_grad():
32
+ outputs = model(**inputs)
33
+ ```
34
+ ### **Training**
35
+ We first compare TextNet with representative hand-crafted backbones,
36
+ such as ResNets and VGG16. For a fair comparison,
37
+ all models are first pre-trained on IC17-MLT [52] and then
38
+ finetuned on Total-Text. The proposed
39
+ TextNet models achieve a better trade-off between accuracy
40
+ and inference speed than previous hand-crafted models by a
41
+ significant margin. In addition, notably, our TextNet-T, -S, and
42
+ -B only have 6.8M, 8.0M, and 8.9M parameters respectively,
43
+ which are more parameter-efficient than ResNets and VGG16.
44
+ These results demonstrate that TextNet models are effective for
45
+ text detection on the GPU device.
46
+
47
+ ### **Applications**
48
+ Perfect for real-world text detection tasks, including:
49
+ - Natural scene text recognition
50
+ - Multi-lingual and multi-oriented text detection
51
+ - Document text region analysis
52
+
53
+ ### **Contribution**
54
+ This model was contributed by [Raghavan](https://huggingface.co/Raghavan),
55
+ [jadechoghari](https://huggingface.co/jadechoghari)
56
+ and [nielsr](https://huggingface.co/nielsr).
config.json ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "TextNetBackbone"
4
+ ],
5
+ "batch_norm_eps": 1e-05,
6
+ "conv_layer_kernel_sizes": [
7
+ [
8
+ [
9
+ 3,
10
+ 3
11
+ ],
12
+ [
13
+ 3,
14
+ 3
15
+ ]
16
+ ],
17
+ [
18
+ [
19
+ 3,
20
+ 3
21
+ ],
22
+ [
23
+ 1,
24
+ 3
25
+ ],
26
+ [
27
+ 3,
28
+ 3
29
+ ],
30
+ [
31
+ 3,
32
+ 1
33
+ ],
34
+ [
35
+ 3,
36
+ 3
37
+ ],
38
+ [
39
+ 3,
40
+ 1
41
+ ],
42
+ [
43
+ 1,
44
+ 3
45
+ ],
46
+ [
47
+ 3,
48
+ 3
49
+ ]
50
+ ],
51
+ [
52
+ [
53
+ 3,
54
+ 3
55
+ ],
56
+ [
57
+ 3,
58
+ 3
59
+ ],
60
+ [
61
+ 1,
62
+ 3
63
+ ],
64
+ [
65
+ 3,
66
+ 1
67
+ ],
68
+ [
69
+ 3,
70
+ 3
71
+ ],
72
+ [
73
+ 1,
74
+ 3
75
+ ],
76
+ [
77
+ 3,
78
+ 1
79
+ ],
80
+ [
81
+ 3,
82
+ 3
83
+ ]
84
+ ],
85
+ [
86
+ [
87
+ 3,
88
+ 3
89
+ ],
90
+ [
91
+ 3,
92
+ 1
93
+ ],
94
+ [
95
+ 1,
96
+ 3
97
+ ],
98
+ [
99
+ 1,
100
+ 3
101
+ ],
102
+ [
103
+ 3,
104
+ 1
105
+ ]
106
+ ]
107
+ ],
108
+ "conv_layer_strides": [
109
+ [
110
+ 1,
111
+ 2
112
+ ],
113
+ [
114
+ 2,
115
+ 1,
116
+ 1,
117
+ 1,
118
+ 1,
119
+ 1,
120
+ 1,
121
+ 1
122
+ ],
123
+ [
124
+ 2,
125
+ 1,
126
+ 1,
127
+ 1,
128
+ 1,
129
+ 1,
130
+ 1,
131
+ 1
132
+ ],
133
+ [
134
+ 2,
135
+ 1,
136
+ 1,
137
+ 1,
138
+ 1
139
+ ]
140
+ ],
141
+ "depths": [
142
+ 2,
143
+ 8,
144
+ 8,
145
+ 5
146
+ ],
147
+ "hidden_sizes": [
148
+ 64,
149
+ 64,
150
+ 128,
151
+ 256,
152
+ 512
153
+ ],
154
+ "image_size": [
155
+ 640,
156
+ 640
157
+ ],
158
+ "initializer_range": 0.02,
159
+ "model_type": "textnet",
160
+ "out_features": [
161
+ "stage1",
162
+ "stage2",
163
+ "stage3",
164
+ "stage4"
165
+ ],
166
+ "out_indices": [
167
+ 1,
168
+ 2,
169
+ 3,
170
+ 4
171
+ ],
172
+ "stage_names": [
173
+ "stem",
174
+ "stage1",
175
+ "stage2",
176
+ "stage3",
177
+ "stage4"
178
+ ],
179
+ "stem_act_func": "relu",
180
+ "stem_kernel_size": 3,
181
+ "stem_num_channels": 3,
182
+ "stem_out_channels": 64,
183
+ "stem_stride": 2,
184
+ "torch_dtype": "float32",
185
+ "transformers_version": "4.48.0.dev0"
186
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:13faabccca26c4c0ec2661470b73d400aba143a2715be0121f131e35b9e652d5
3
+ size 47938960
preprocessor_config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": {
3
+ "height": 224,
4
+ "width": 224
5
+ },
6
+ "do_center_crop": false,
7
+ "do_convert_rgb": true,
8
+ "do_normalize": true,
9
+ "do_rescale": true,
10
+ "do_resize": true,
11
+ "image_mean": [
12
+ 0.485,
13
+ 0.456,
14
+ 0.406
15
+ ],
16
+ "image_processor_type": "TextNetImageProcessor",
17
+ "image_std": [
18
+ 0.229,
19
+ 0.224,
20
+ 0.225
21
+ ],
22
+ "resample": 2,
23
+ "rescale_factor": 0.00392156862745098,
24
+ "size": {
25
+ "shortest_edge": 640
26
+ },
27
+ "size_divisor": 32
28
+ }