davidhajdu commited on
Commit
c3d37b5
·
verified ·
1 Parent(s): a382eaa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -48
README.md CHANGED
@@ -11,73 +11,79 @@ pipeline_tag: object-detection
11
 
12
  # Model Card for Model ID
13
 
14
- <!-- Provide a quick summary of what the model is/does. -->
15
 
 
 
 
 
 
 
 
 
 
16
 
17
 
18
  ## Model Details
19
 
 
 
20
  ### Model Description
21
 
22
  <!-- Provide a longer summary of what this model is. -->
23
 
24
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
25
 
26
- - **Developed by:** [More Information Needed]
27
- - **Funded by [optional]:** [More Information Needed]
28
- - **Shared by [optional]:** [More Information Needed]
29
- - **Model type:** [More Information Needed]
30
- - **Language(s) (NLP):** [More Information Needed]
31
- - **License:** [More Information Needed]
32
- - **Finetuned from model [optional]:** [More Information Needed]
33
-
34
- ### Model Sources [optional]
35
 
36
  <!-- Provide the basic links for the model. -->
37
 
38
- - **Repository:** [More Information Needed]
39
- - **Paper [optional]:** [More Information Needed]
40
- - **Demo [optional]:** [More Information Needed]
41
-
42
- ## Uses
43
-
44
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
45
-
46
- ### Direct Use
47
-
48
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
49
-
50
- [More Information Needed]
51
-
52
- ### Downstream Use [optional]
53
-
54
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
55
-
56
- [More Information Needed]
57
-
58
- ### Out-of-Scope Use
59
 
60
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
61
 
62
- [More Information Needed]
63
 
64
- ## Bias, Risks, and Limitations
 
 
65
 
66
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
67
 
68
- [More Information Needed]
 
69
 
70
- ### Recommendations
 
71
 
72
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
73
 
74
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 
75
 
76
- ## How to Get Started with the Model
77
-
78
- Use the code below to get started with the model.
79
 
80
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
  ## Training Details
83
 
@@ -85,15 +91,15 @@ Use the code below to get started with the model.
85
 
86
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
87
 
88
- [More Information Needed]
89
 
90
  ### Training Procedure
91
 
92
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
93
 
94
- #### Preprocessing [optional]
95
 
96
- [More Information Needed]
97
 
98
 
99
  #### Training Hyperparameters
@@ -174,7 +180,7 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
174
 
175
  [More Information Needed]
176
 
177
- ## Citation [optional]
178
 
179
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
180
 
@@ -192,7 +198,7 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
192
  ```
193
 
194
 
195
- ## Model Card Authors [optional]
196
 
197
  [David Hajdu](https://huggingface.co/davidhajdu)
198
 
 
11
 
12
  # Model Card for Model ID
13
 
 
14
 
15
+ ## Table of Contents
16
+
17
+ 1. [Model Details](#model-details)
18
+ 2. [Model Sources](#model-sources)
19
+ 3. [How to Get Started with the Model](#how-to-get-started-with-the-model)
20
+ 4. [Training Details](#training-details)
21
+ 5. [Evaluation](#evaluation)
22
+ 6. [Model Architecture and Objective](#model-architecture-and-objective)
23
+ 7. [Citation](#citation)
24
 
25
 
26
  ## Model Details
27
 
28
+ We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR. This new formulation directly uses box coordinates as queries in Transformer decoders and dynamically updates them layer-by-layer. Using box coordinates not only helps using explicit positional priors to improve the query-to-feature similarity and eliminate the slow training convergence issue in DETR, but also allows us to modulate the positional attention map using the box width and height information. Such a design makes it clear that queries in DETR can be implemented as performing soft ROI pooling layer-by-layer in a cascade manner. As a result, it leads to the best performance on MS-COCO benchmark among the DETR-like detection models under the same setting, e.g., AP 45.7\% using ResNet50-DC5 as backbone trained in 50 epochs. We also conducted extensive experiments to confirm our analysis and verify the effectiveness of our methods.
29
+
30
  ### Model Description
31
 
32
  <!-- Provide a longer summary of what this model is. -->
33
 
34
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
35
 
36
+ - **Developed by:** Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang
37
+ - **Funded by:** IDEA-Research
38
+ - **Shared by:** David Hajdu
39
+ - **Model type:** DAB-DETR
40
+ - **License:** Apache-2.0
41
+ -
42
+ ### Model Sources
 
 
43
 
44
  <!-- Provide the basic links for the model. -->
45
 
46
+ - **Repository:** https://github.com/IDEA-Research/DAB-DETR
47
+ - **Paper:** https://arxiv.org/abs/2201.12329
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
+ ## How to Get Started with the Model
50
 
51
+ Use the code below to get started with the model.
52
 
53
+ ```python
54
+ import torch
55
+ import requests
56
 
57
+ from PIL import Image
58
+ from transformers import AutoModelForObjectDetection, AutoImageProcessor
59
 
60
+ url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
61
+ image = Image.open(requests.get(url, stream=True).raw)
62
 
63
+ image_processor = AutoImageProcessor.from_pretrained("davidhajdu/dab-detr-resnet-50")
64
+ model = AutoModelForObjectDetection.from_pretrained("davidhajdu/dab-detr-resnet-50")
65
 
66
+ inputs = image_processor(images=image, return_tensors="pt")
67
 
68
+ with torch.no_grad():
69
+ outputs = model(**inputs)
70
 
71
+ results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([image.size[::-1]]), threshold=0.3)
 
 
72
 
73
+ for result in results:
74
+ for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
75
+ score, label = score.item(), label_id.item()
76
+ box = [round(i, 2) for i in box.tolist()]
77
+ print(f"{model.config.id2label[label]}: {score:.2f} {box}")
78
+ ```
79
+ This should output
80
+ ```
81
+ cat: 0.87 [14.7, 49.39, 320.52, 469.28]
82
+ remote: 0.86 [41.08, 72.37, 173.39, 117.2]
83
+ cat: 0.86 [344.45, 19.43, 639.85, 367.86]
84
+ remote: 0.61 [334.27, 75.93, 367.92, 188.81]
85
+ couch: 0.59 [-0.04, 1.34, 639.9, 477.09]
86
+ ```
87
 
88
  ## Training Details
89
 
 
91
 
92
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
93
 
94
+ The DAB-DETR model was trained on [COCO 2017 object detection](https://cocodataset.org/#download), a dataset consisting of 118k/5k annotated images for training/validation respectively.
95
 
96
  ### Training Procedure
97
 
98
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
99
 
100
+ #### Preprocessing
101
 
102
+ Images are resized/rescaled such that the shortest side is at least 480 and at most 800 pixels and the long size is at most 1333 pixels, and normalized across the RGB channels with the ImageNet mean (0.485, 0.456, 0.406) and standard deviation (0.229, 0.224, 0.225).
103
 
104
 
105
  #### Training Hyperparameters
 
180
 
181
  [More Information Needed]
182
 
183
+ ## Citation
184
 
185
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
186
 
 
198
  ```
199
 
200
 
201
+ ## Model Card Authors
202
 
203
  [David Hajdu](https://huggingface.co/davidhajdu)
204