czczup commited on
Commit
5c2a445
·
verified ·
1 Parent(s): 5082c44

Upload textnet models

Browse files
Files changed (4) hide show
  1. README.md +56 -3
  2. config.json +146 -0
  3. model.safetensors +3 -0
  4. preprocessor_config.json +28 -0
README.md CHANGED
@@ -1,3 +1,56 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ ---
4
+ ## TextNet-T/S/B: Efficient Text Detection Models
5
+
6
+ ### **Overview**
7
+ TextNet is a lightweight and efficient architecture designed specifically for text detection, offering superior performance compared to traditional models like MobileNetV3. With variants **TextNet-T**, **TextNet-S**, and **TextNet-B** (6.8M, 8.0M, and 8.9M parameters respectively), it achieves an excellent balance between accuracy and inference speed.
8
+
9
+ ### **Performance**
10
+ TextNet achieves state-of-the-art results in text detection, outperforming hand-crafted models in both accuracy and speed. Its architecture is highly efficient, making it ideal for GPU-based applications.
11
+
12
+ ### How to use
13
+ ### Transformers
14
+ ```bash
15
+ pip install transformers
16
+ ```
17
+
18
+ ```python
19
+ import torch
20
+ import requests
21
+ from PIL import Image
22
+ from transformers import AutoImageProcessor, AutoBackbone
23
+
24
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
25
+ image = Image.open(requests.get(url, stream=True).raw)
26
+
27
+ processor = AutoImageProcessor.from_pretrained("jadechoghari/textnet-tiny")
28
+ model = AutoBackbone.from_pretrained("jadechoghari/textnet-base")
29
+
30
+ inputs = processor(image, return_tensors="pt")
31
+ with torch.no_grad():
32
+ outputs = model(**inputs)
33
+ ```
34
+ ### **Training**
35
+ We first compare TextNet with representative hand-crafted backbones,
36
+ such as ResNets and VGG16. For a fair comparison,
37
+ all models are first pre-trained on IC17-MLT [52] and then
38
+ finetuned on Total-Text. The proposed
39
+ TextNet models achieve a better trade-off between accuracy
40
+ and inference speed than previous hand-crafted models by a
41
+ significant margin. In addition, notably, our TextNet-T, -S, and
42
+ -B only have 6.8M, 8.0M, and 8.9M parameters respectively,
43
+ which are more parameter-efficient than ResNets and VGG16.
44
+ These results demonstrate that TextNet models are effective for
45
+ text detection on the GPU device.
46
+
47
+ ### **Applications**
48
+ Perfect for real-world text detection tasks, including:
49
+ - Natural scene text recognition
50
+ - Multi-lingual and multi-oriented text detection
51
+ - Document text region analysis
52
+
53
+ ### **Contribution**
54
+ This model was contributed by [Raghavan](https://huggingface.co/Raghavan),
55
+ [jadechoghari](https://huggingface.co/jadechoghari)
56
+ and [nielsr](https://huggingface.co/nielsr).
config.json ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "TextNetBackbone"
4
+ ],
5
+ "batch_norm_eps": 1e-05,
6
+ "conv_layer_kernel_sizes": [
7
+ [
8
+ [
9
+ 3,
10
+ 3
11
+ ],
12
+ [
13
+ 3,
14
+ 3
15
+ ],
16
+ [
17
+ 3,
18
+ 3
19
+ ]
20
+ ],
21
+ [
22
+ [
23
+ 3,
24
+ 3
25
+ ],
26
+ [
27
+ 1,
28
+ 3
29
+ ],
30
+ [
31
+ 3,
32
+ 3
33
+ ],
34
+ [
35
+ 3,
36
+ 1
37
+ ]
38
+ ],
39
+ [
40
+ [
41
+ 3,
42
+ 3
43
+ ],
44
+ [
45
+ 3,
46
+ 3
47
+ ],
48
+ [
49
+ 3,
50
+ 1
51
+ ],
52
+ [
53
+ 1,
54
+ 3
55
+ ]
56
+ ],
57
+ [
58
+ [
59
+ 3,
60
+ 3
61
+ ],
62
+ [
63
+ 3,
64
+ 1
65
+ ],
66
+ [
67
+ 1,
68
+ 3
69
+ ],
70
+ [
71
+ 3,
72
+ 3
73
+ ]
74
+ ]
75
+ ],
76
+ "conv_layer_strides": [
77
+ [
78
+ 1,
79
+ 2,
80
+ 1
81
+ ],
82
+ [
83
+ 2,
84
+ 1,
85
+ 1,
86
+ 1
87
+ ],
88
+ [
89
+ 2,
90
+ 1,
91
+ 1,
92
+ 1
93
+ ],
94
+ [
95
+ 2,
96
+ 1,
97
+ 1,
98
+ 1
99
+ ]
100
+ ],
101
+ "depths": [
102
+ 3,
103
+ 4,
104
+ 4,
105
+ 4
106
+ ],
107
+ "hidden_sizes": [
108
+ 64,
109
+ 64,
110
+ 128,
111
+ 256,
112
+ 512
113
+ ],
114
+ "image_size": [
115
+ 640,
116
+ 640
117
+ ],
118
+ "initializer_range": 0.02,
119
+ "model_type": "textnet",
120
+ "out_features": [
121
+ "stage1",
122
+ "stage2",
123
+ "stage3",
124
+ "stage4"
125
+ ],
126
+ "out_indices": [
127
+ 1,
128
+ 2,
129
+ 3,
130
+ 4
131
+ ],
132
+ "stage_names": [
133
+ "stem",
134
+ "stage1",
135
+ "stage2",
136
+ "stage3",
137
+ "stage4"
138
+ ],
139
+ "stem_act_func": "relu",
140
+ "stem_kernel_size": 3,
141
+ "stem_num_channels": 3,
142
+ "stem_out_channels": 64,
143
+ "stem_stride": 2,
144
+ "torch_dtype": "float32",
145
+ "transformers_version": "4.48.0.dev0"
146
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:203334ca0d2f1a0f8b4dbfe2ad37f73d215ce681c25443ccdc483d845f3435cb
3
+ size 42955744
preprocessor_config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": {
3
+ "height": 224,
4
+ "width": 224
5
+ },
6
+ "do_center_crop": false,
7
+ "do_convert_rgb": true,
8
+ "do_normalize": true,
9
+ "do_rescale": true,
10
+ "do_resize": true,
11
+ "image_mean": [
12
+ 0.485,
13
+ 0.456,
14
+ 0.406
15
+ ],
16
+ "image_processor_type": "TextNetImageProcessor",
17
+ "image_std": [
18
+ 0.229,
19
+ 0.224,
20
+ 0.225
21
+ ],
22
+ "resample": 2,
23
+ "rescale_factor": 0.00392156862745098,
24
+ "size": {
25
+ "shortest_edge": 640
26
+ },
27
+ "size_divisor": 32
28
+ }