Generate segmentation masks from images and bounding boxes
Visualise outputs of VideoMAE
Image Retrieval on the Food101 dataset