File size: 2,256 Bytes
a6b7887
 
 
 
15f087a
4d15c0a
15f087a
4d15c0a
15f087a
 
 
 
 
 
 
 
 
 
 
a6b7887
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
license: apache-2.0
---
# Handwriting-Removal-DIS
My effort into improving handwriting removal throught the new [DIS (Dichotomous Image Segmentation)](https://github.com/xuebinqin/DIS)

## Inference

1. Clone the DIS github:

```cmd
git clone https://github.com/xuebinqin/DIS
```

2. Install the requirements via ```pip install -r requirements.txt```

3. Replace ```Inference.py``` in the cloned DIS folder to the ```Inference.py``` of this repository.

4. Change the paths according to your own application.

## Related Research
AndSonder has also done research and experimentaion on the same subject but using deeplabv3+ to segment the handwriting.

This is a link to his repo: [https://github.com/AndSonder/HandWritingEraser-Pytorch](https://github.com/AndSonder/HandWritingEraser-Pytorch)

HUGE THANKS to them for providing the segmentation datasets labeled with background blue, printed characters green, and handwriting in red.

## Dataset
The original dataset is in Baidu Web Storage and is a segmentation dataset, unlike a background removal dataset.

Therefore, after some processing, I generated a background-removal dataset. It is available in Huggingface: [https://huggingface.co./datasets/Inoob/HandwritingSegmentationDataset](https://huggingface.co./datasets/Inoob/HandwritingSegmentationDataset).

The relavent contents of the repo is listed:

```
|- train.zip
|- val.zip
```

After unzipping train.zip and val.zip, the file tree should look like:

```
|-train
|    |-gt
|    |  |- dehw_train_00714.png
|    |  |- dehw_train_00715.png
|    |  ...
|    |-im
|    |  |- dehw_train_00714.jpg
|    |  |- dehw_train_00715.jpg
|-val
|    |-gt
|    |  |- dehw_train_00000.png
|    |  |- dehw_train_00001.png
|    |  ...
|    |-im
|    |  |- dehw_train_00000.png
|    |  |- dehw_train_00001.png
```

the ```gt``` folder is masks. With the background masked in black, and the handwriting masked as white (a.k.a ground truth data).

the ```im``` folder is the normal image of the handwriting dataset.

The code that was used to generate the dataset in the Huggingface Repo is ```create_masks.py```

## Training

I used the ```train_valid_inference_main.py``` from [DIS](https://github.com/xuebinqin/DIS) with my own dataset and training batch size.