The Sora Video Generation Aligned Words dataset contains a collection of word segments for text-to-video or other multimodal research. It is intended to help researchers and engineers explore fine-grained prompts, including those where certain words are not aligned with the video.
We hope this dataset will support your work in prompt understanding and advance progress in multimodal projects.
Runway Gen-3 Alpha: The Style and Coherence Champion
Runway's latest video generation model, Gen-3 Alpha, is something special. It ranks #3 overall on our text-to-video human preference benchmark, but in terms of style and coherence, it outperforms even OpenAI Sora.
However, it struggles with alignment, making it less predictable for controlled outputs.
We've released a new dataset with human evaluations of Runway Gen-3 Alpha: Rapidata's text-2-video human preferences dataset. If you're working on video generation and want to see how your model compares to the biggest players, we can benchmark it for you.
We benchmarked @xai-org 's Aurora model, as far as we know the first public evaluation of the model at scale.
We collected 401k human annotations in over the past ~2 days for this, we have uploaded all of the annotation data here on huggingface with a fully permissive license Rapidata/xAI_Aurora_t2i_human_preferences