File size: 2,256 Bytes
a6b7887 15f087a 4d15c0a 15f087a 4d15c0a 15f087a a6b7887 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
---
license: apache-2.0
---
# Handwriting-Removal-DIS
My effort into improving handwriting removal throught the new [DIS (Dichotomous Image Segmentation)](https://github.com/xuebinqin/DIS)
## Inference
1. Clone the DIS github:
```cmd
git clone https://github.com/xuebinqin/DIS
```
2. Install the requirements via ```pip install -r requirements.txt```
3. Replace ```Inference.py``` in the cloned DIS folder to the ```Inference.py``` of this repository.
4. Change the paths according to your own application.
## Related Research
AndSonder has also done research and experimentaion on the same subject but using deeplabv3+ to segment the handwriting.
This is a link to his repo: [https://github.com/AndSonder/HandWritingEraser-Pytorch](https://github.com/AndSonder/HandWritingEraser-Pytorch)
HUGE THANKS to them for providing the segmentation datasets labeled with background blue, printed characters green, and handwriting in red.
## Dataset
The original dataset is in Baidu Web Storage and is a segmentation dataset, unlike a background removal dataset.
Therefore, after some processing, I generated a background-removal dataset. It is available in Huggingface: [https://huggingface.co./datasets/Inoob/HandwritingSegmentationDataset](https://huggingface.co./datasets/Inoob/HandwritingSegmentationDataset).
The relavent contents of the repo is listed:
```
|- train.zip
|- val.zip
```
After unzipping train.zip and val.zip, the file tree should look like:
```
|-train
| |-gt
| | |- dehw_train_00714.png
| | |- dehw_train_00715.png
| | ...
| |-im
| | |- dehw_train_00714.jpg
| | |- dehw_train_00715.jpg
|-val
| |-gt
| | |- dehw_train_00000.png
| | |- dehw_train_00001.png
| | ...
| |-im
| | |- dehw_train_00000.png
| | |- dehw_train_00001.png
```
the ```gt``` folder is masks. With the background masked in black, and the handwriting masked as white (a.k.a ground truth data).
the ```im``` folder is the normal image of the handwriting dataset.
The code that was used to generate the dataset in the Huggingface Repo is ```create_masks.py```
## Training
I used the ```train_valid_inference_main.py``` from [DIS](https://github.com/xuebinqin/DIS) with my own dataset and training batch size.
|