File size: 3,544 Bytes
f3149cb 3c330ba c440cf3 3c330ba f3149cb c440cf3 f3149cb c440cf3 f3149cb c440cf3 f3149cb c440cf3 f3149cb c440cf3 f3149cb c440cf3 f3149cb c440cf3 f3149cb c440cf3 f3149cb c440cf3 f3149cb c440cf3 f3149cb c440cf3 f3149cb 3c330ba f3149cb c440cf3 f3149cb c440cf3 f3149cb c440cf3 f3149cb c440cf3 2c5dbbd c440cf3 f3149cb c440cf3 f3149cb c440cf3 f3149cb c440cf3 f3149cb c440cf3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
library_name: transformers
license: apache-2.0
language:
- en
pipeline_tag: object-detection
tags:
- object-detection
- vision
datasets:
- coco
widget:
- src: >-
https://huggingface.co./datasets/mishig/sample_images/resolve/main/savanna.jpg
example_title: Savanna
- src: >-
https://huggingface.co./datasets/mishig/sample_images/resolve/main/football-match.jpg
example_title: Football Match
- src: >-
https://huggingface.co./datasets/mishig/sample_images/resolve/main/airport.jpg
example_title: Airport
---
## RT-DETRv2
### **Overview**
The RT-DETRv2 model was proposed in [RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer](https://arxiv.org/abs/2407.17140) by Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu. RT-DETRv2 refines RT-DETR by introducing selective multi-scale feature extraction, a discrete sampling operator for broader deployment compatibility, and improved training strategies like dynamic data augmentation and scale-adaptive hyperparameters.
These changes enhance flexibility and practicality while maintaining real-time performance.
This model was contributed by [@jadechoghari](https://x.com/jadechoghari) with the help of [@cyrilvallez](https://huggingface.co./cyrilvallez) and [@qubvel-hf](https://huggingface.co./qubvel-hf)
This is
### **Performance**
RT-DETRv2 consistently outperforms its predecessor across all model sizes while maintaining the same real-time speeds.
![rt-detr-v2-graph.png](https://huggingface.co./datasets/jadechoghari/images/resolve/main/rt-detr-v2-graph.png)
### **How to use**
```python
import torch
import requests
from PIL import Image
from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
image_processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r101vd")
model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r101vd")
inputs = image_processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([(image.height, image.width)]), threshold=0.5)
for result in results:
for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
score, label = score.item(), label_id.item()
box = [round(i, 2) for i in box.tolist()]
print(f"{model.config.id2label[label]}: {score:.2f} {box}")
```
```
cat: 0.97 [341.14, 25.11, 639.98, 372.89]
cat: 0.96 [12.78, 56.35, 317.67, 471.34]
remote: 0.95 [39.96, 73.12, 175.65, 117.44]
sofa: 0.86 [-0.11, 2.97, 639.89, 473.62]
sofa: 0.82 [-0.12, 1.78, 639.87, 473.52]
remote: 0.79 [333.65, 76.38, 370.69, 187.48]
```
### **Training**
RT-DETRv2 is trained on COCO (Lin et al. [2014]) train2017 and validated on COCO val2017 dataset. We report the standard AP metrics (averaged over uniformly sampled IoU thresholds ranging from 0.50 − 0.95 with a step size of 0.05), and APval50 commonly used in real scenarios.
### **Applications**
RT-DETRv2 is ideal for real-time object detection in diverse applications such as **autonomous driving**, **surveillance systems**, **robotics**, and **retail analytics**. Its enhanced flexibility and deployment-friendly design make it suitable for both edge devices and large-scale systems + ensures high accuracy and speed in dynamic, real-world environments. |