Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement
Abstract
We propose Dataset Reinforcement, a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users. We propose a Dataset Reinforcement strategy based on data augmentation and knowledge distillation. Our generic strategy is designed based on extensive analysis across CNN- and transformer-based models and performing large-scale study of distillation with state-of-the-art models with various data augmentations. We create a reinforced version of the ImageNet training dataset, called <PRE_TAG>ImageNet+</POST_TAG>, as well as reinforced datasets CIFAR-100+, Flowers-102+, and Food-101+. Models trained with <PRE_TAG>ImageNet+</POST_TAG> are more accurate, robust, and calibrated, and transfer well to downstream tasks (e.g., segmentation and detection). As an example, the accuracy of ResNet-50 improves by 1.7% on the <PRE_TAG>ImageNet validation set</POST_TAG>, 3.5% on <PRE_TAG>ImageNetV2</POST_TAG>, and 10.0% on <PRE_TAG>ImageNet-R</POST_TAG>. Expected Calibration Error (ECE) on the <PRE_TAG>ImageNet validation set</POST_TAG> is also reduced by 9.9%. Using this backbone with Mask-R<PRE_TAG>CNN</POST_TAG> for object detection on MS-COCO, the mean average precision improves by 0.8%. We reach similar gains for MobileNets, ViTs, and Swin-Transformers. For MobileNetV3 and Swin-Tiny we observe significant improvements on <PRE_TAG>ImageNet-R</POST_TAG>/A/C of up to 10% improved robustness. Models pretrained on <PRE_TAG>ImageNet+</POST_TAG> and fine-tuned on CIFAR-100+, Flowers-102+, and Food-101+, reach up to 3.4% improved accuracy.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper