jjmcarrascosa
commited on
Commit
•
2a70f18
1
Parent(s):
6a3f653
Update README.md
Browse files
README.md
CHANGED
@@ -10,9 +10,6 @@ model-index:
|
|
10 |
results: []
|
11 |
---
|
12 |
|
13 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
14 |
-
should probably proofread and complete it, then remove this comment. -->
|
15 |
-
|
16 |
# vit_tickers_binaryclf
|
17 |
|
18 |
This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the cord dataset.
|
@@ -22,18 +19,28 @@ It achieves the following results on the evaluation set:
|
|
22 |
|
23 |
## Model description
|
24 |
|
25 |
-
|
26 |
|
27 |
## Intended uses & limitations
|
28 |
|
29 |
-
|
30 |
|
31 |
## Training and evaluation data
|
32 |
|
33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
## Training procedure
|
36 |
|
|
|
|
|
|
|
|
|
37 |
### Training hyperparameters
|
38 |
|
39 |
The following hyperparameters were used during training:
|
|
|
10 |
results: []
|
11 |
---
|
12 |
|
|
|
|
|
|
|
13 |
# vit_tickers_binaryclf
|
14 |
|
15 |
This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the cord dataset.
|
|
|
19 |
|
20 |
## Model description
|
21 |
|
22 |
+
This model is a Binary Classifier finetuned version of ViT, to predict if an input image is a picture / scan of ticket(s) o something else.
|
23 |
|
24 |
## Intended uses & limitations
|
25 |
|
26 |
+
Use this model to classify your images into tickets or not tickers. WIth the tickets group, you can use Multimodal Information Extraction, as Visual Named Entity Recognition, to extract the ticket items, amounts, total, etc. Check the Cord dataset for more information.
|
27 |
|
28 |
## Training and evaluation data
|
29 |
|
30 |
+
This model used 2 datasets as positive class (`ticket`):
|
31 |
+
- `cord`
|
32 |
+
- `https://expressexpense.com/blog/free-receipt-images-ocr-machine-learning-dataset/`
|
33 |
+
|
34 |
+
For the negative class (`no_ticket`), the following datasets were used:
|
35 |
+
- A subset of `RVL-CDIP`
|
36 |
+
- A subset of `visual-genome`
|
37 |
|
38 |
## Training procedure
|
39 |
|
40 |
+
Datasets were loaded with different distributions of data for positive and negative classes. Then, normalization and resizing is carried out to adapt it to ViT expected input.
|
41 |
+
|
42 |
+
Different runs were carried out changing the data distribution and the hyperparameters to maximize F1.
|
43 |
+
|
44 |
### Training hyperparameters
|
45 |
|
46 |
The following hyperparameters were used during training:
|