File size: 2,030 Bytes
3e8ca64
 
c4bb5e9
 
 
 
 
 
 
 
 
 
3e8ca64
e4c187f
3e8ca64
c4bb5e9
3e8ca64
c4bb5e9
3e8ca64
c4bb5e9
3e8ca64
c4bb5e9
3e8ca64
c4bb5e9
3e8ca64
c4bb5e9
 
 
 
3e8ca64
c4bb5e9
 
 
 
 
3e8ca64
c4bb5e9
 
 
3e8ca64
c4bb5e9
3e8ca64
c4bb5e9
 
 
 
e4c187f
3e8ca64
c4bb5e9
 
 
 
 
 
 
 
3e8ca64
c4bb5e9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
library_name: transformers
license: mit
language:
- en
pipeline_tag: object-detection
base_model:
- hustvl/yolos-tiny
tags:
- object-detection
- fashion
- search
---
This model is fine-tuned version of hustvl/yolos-tiny.

You can find details of model in this github repo -> [fashion-visual-search](https://github.com/yainage90/fashion-visual-search)

And you can find fashion image feature extractor model -> [yainage90/fashion-image-feature-extractor](https://huggingface.co./yainage90/fashion-image-feature-extractor)

This model was trained using a combination of two datasets: [modanet](https://github.com/eBay/modanet) and [fashionpedia](https://fashionpedia.github.io/home/)

The labels are ['bag', 'bottom', 'dress', 'hat', 'shoes', 'outer', 'top']

In the 96th epoch out of total of 100 epochs, the best score was achieved with mAP 0.697400.

``` python
from PIL import Image
import torch
from transformers import  YolosImageProcessor, YolosForObjectDetection

device = 'cpu'
if torch.cuda.is_available():
    device = torch.device('cuda')
elif torch.backends.mps.is_available():
    device = torch.device('mps')

ckpt = 'yainage90/fashion-object-detection-yolos-tiny'
image_processor = YolosImageProcessor.from_pretrained(ckpt)
model = YolosForObjectDetection.from_pretrained(ckpt).to(device)

image = Image.open('<path/to/image>').convert('RGB')

with torch.no_grad():
    inputs = image_processor(images=[image], return_tensors="pt")
    outputs = model(**inputs.to(device))
    target_sizes = torch.tensor([[image.size[1], image.size[0]]])
    results = image_processor.post_process_object_detection(outputs, threshold=0.85, target_sizes=target_sizes)[0]

    items = []
    for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
        score = score.item()
        label = label.item()
        box = [i.item() for i in box]
        print(f"{model.config.id2label[label]}: {round(score, 3)} at {box}")
        items.append((score, label, box))
```

![sample_image](sample_image.png)