File size: 2,883 Bytes
1dd46ee
 
 
 
 
 
 
 
 
 
 
 
 
 
12d33e8
1dd46ee
12d33e8
 
 
 
1dd46ee
12d33e8
951a9ca
1dd46ee
 
 
 
 
 
 
 
bec71cf
1dd46ee
 
 
 
 
 
 
 
146d693
1dd46ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
license_name: server-side-public-license
license_link: https://www.mongodb.com/licensing/server-side-public-license
tags:
- fashion
- cloth-retrieval
- e-commerce
- segmentation
datasets:
- rizavelioglu/fashionfail
- detection-datasets/fashionpedia
pipeline_tag: object-detection
---

## Facere*

The models proposed in the paper _"FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation"_
[[paper]](https://arxiv.org/abs/2404.08582) [[project page]](https://rizavelioglu.github.io/fashionfail/):
- `facere_base.onnx`: A pre-trained Mask R-CNN fine-tuned on `Fashionpedia-train`.
- `facere_plus.onnx`: `facere_base` model further fine-tuned on `FashionFail-train`.

_* Facere (fa:chere) is a Latin word for 'to make', from which the word fashion is derived.[[source]](https://en.wikipedia.org/wiki/Fashion#:~:text=The%20term,to%20make)_

## Usage

```python
from torchvision.io import read_image
from torchvision.models.detection import MaskRCNN_ResNet50_FPN_Weights
from huggingface_hub import hf_hub_download

path_onnx = hf_hub_download(
    repo_id="rizavelioglu/fashionfail",
    filename="facere_base.onnx",  # or "facere_plus.onnx"
)

# Load pre-trained model transformations.
weights = MaskRCNN_ResNet50_FPN_Weights.DEFAULT
transforms = weights.transforms()

# Load image and apply original transformation to the image.
img = read_image("path/to/image")
img_transformed = transforms(img)

# Create an inference session.
ort_session = onnxruntime.InferenceSession(
    path_onnx, providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
)

# Run inference on the input.
ort_inputs = {
    ort_session.get_inputs()[0].name: img_transformed.unsqueeze(dim=0).numpy()
}
ort_outs = ort_session.run(None, ort_inputs)

# Parse the model output.
boxes, labels, scores, masks = ort_outs
```

> Check out the demo code on [HuggingFace Spaces][ff-hf_spaces] for visualizing the output.

> Also, check out [FashionFail's GitHub repository](https://github.com/rizavelioglu/fashionfail) to get more information on
> training, inference, and evaluation.

### License
TL;DR: Not available for commercial use, unless the FULL source code is shared! \
This project is intended solely for academic research. No commercial benefits are derived from it.
Models are licensed under [Server Side Public License (SSPL)](https://www.mongodb.com/legal/licensing/server-side-public-license)

### Citation
If you find this repository useful in your research, please consider giving a star ⭐ and a citation:
```
@inproceedings{velioglu2024fashionfail,
  author    = {Velioglu, Riza and Chan, Robin and Hammer, Barbara},
  title     = {FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation},
  journal   = {IJCNN},
  eprint    = {2404.08582},
  year      = {2024},
}
```

[ff-hf_spaces]: https://huggingface.co./spaces/rizavelioglu/fashionfail