VideoMAE finetuned for shot scale classification
videomae-base-finetuned-kinetics model finetuned to classify shot scale into five classes: ECS (Extreme close-up shot), CS (close-up shot), MS (medium shot), FS (full shot), LS (long shot)
Movienet dataset is used for finetuning the model for 5 epochs. v1_split_trailer.json provides the training, validation and test data splits.
Evaluation
Model achieves accuracy of 88.93% and macro-f1 of 89.19%
Class-wise accuracies: ECS - 91.16%, CS - 83.65, MS - 86.2%, FS - 90.74%, LS - 94.55%
How to use
This is how model can be tested on a shot/clip from a video. Same code is used to process, transform and evaluate on the movienet test set.
from transformers import VideoMAEImageProcessor, VideoMAEForVideoClassification
from pytorchvideo.transforms import ApplyTransformToKey
from torchvision.transforms import v2
from decord import VideoReader, cpu
## Evaluation Transform
transform = v2.Compose(
[
ApplyTransformToKey(
key="video",
transform=v2.Compose(
[
v2.Lambda(lambda x: x.permute(0, 3, 1, 2)), # T, H, W, C -> T, C, H, W
v2.UniformTemporalSubsample(16),
v2.Resize(resize_to),
v2.Lambda(lambda x: x / 255.0),
v2.Normalize(img_mean, img_std)
]
),
),
]
)
## Preprocessor and Model loading
image_processor = VideoMAEImageProcessor.from_pretrained("gullalc/videomae-base-finetuned-kinetics-movieshots-scale")
model = VideoMAEForVideoClassification.from_pretrained("gullalc/videomae-base-finetuned-kinetics-movieshots-scale")
img_mean = image_processor.image_mean
img_std = image_processor.image_std
height = width = image_processor.size["shortest_edge"]
resize_to = (height, width)
## load video/clip and predict
video_path = "random_clip.mp4"
vr = VideoReader(video_path, width=480, height=270, ctx=cpu(0))
frames_tensor = torch.stack([torch.tensor(vr[i].asnumpy()) for i in range(len(vr))]) ## Shape: (T, H, W, C)
frames_tensor = transform({"video": frames_tensor})["video"]
output = model(pixel_values=frames_tensor)
pred = torch.argmax(outputs.logits, axis=1).cpu().numpy()
print(model.config.id2label[pred[0]])
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.