jadechoghari commited on
Commit
af495e2
·
verified ·
1 Parent(s): e382b82

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## TextNet-T/S/B: Efficient Text Detection Models
2
+
3
+ ### **Overview**
4
+ TextNet is a lightweight and efficient architecture designed specifically for text detection, offering superior performance compared to traditional models like MobileNetV3. With variants **TextNet-T**, **TextNet-S**, and **TextNet-B** (6.8M, 8.0M, and 8.9M parameters respectively), it achieves an excellent balance between accuracy and inference speed.
5
+
6
+ ### **Performance**
7
+ TextNet achieves state-of-the-art results in text detection, outperforming hand-crafted models in both accuracy and speed. Its architecture is highly efficient, making it ideal for GPU-based applications.
8
+
9
+ ### How to use
10
+ ### Transformers
11
+ ```bash
12
+ pip install transformers
13
+ ```
14
+
15
+ ```python
16
+ import torch
17
+ import requests
18
+ from PIL import Image
19
+ from transformers import AutoImageProcessor, AutoBackbone
20
+
21
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
22
+ image = Image.open(requests.get(url, stream=True).raw)
23
+
24
+ processor = AutoImageProcessor.from_pretrained("jadechoghari/textnet-base")
25
+ model = AutoBackbone.from_pretrained("jadechoghari/textnet-base")
26
+
27
+ inputs = processor(image, return_tensors="pt")
28
+ with torch.no_grad():
29
+ outputs = model(**inputs)
30
+ ```
31
+ ### **Training**
32
+ We first compare TextNet with representative hand-crafted backbones,
33
+ such as ResNets and VGG16. For a fair comparison,
34
+ all models are first pre-trained on IC17-MLT [52] and then
35
+ finetuned on Total-Text. The proposed
36
+ TextNet models achieve a better trade-off between accuracy
37
+ and inference speed than previous hand-crafted models by a
38
+ significant margin. In addition, notably, our TextNet-T, -S, and
39
+ -B only have 6.8M, 8.0M, and 8.9M parameters respectively,
40
+ which are more parameter-efficient than ResNets and VGG16.
41
+ These results demonstrate that TextNet models are effective for
42
+ text detection on the GPU device.
43
+
44
+ ### **Applications**
45
+ Perfect for real-world text detection tasks, including:
46
+ - Natural scene text recognition
47
+ - Multi-lingual and multi-oriented text detection
48
+ - Document text region analysis
49
+
50
+ ### **Contribution**
51
+ This model was contributed by [Raghavan](https://huggingface.co/Raghavan),
52
+ [jadechoghari](https://huggingface.co/jadechoghari)
53
+ and [nielsr](https://huggingface.co/nielsr).