--- license: gpl-3.0 datasets: - nkp37/OpenVid-1M - TempoFunk/webvid-10M base_model: - VideoCrafter/VideoCrafter2 pipeline_tag: text-to-video --- # Advanced text-to-video Diffusion Models ⚡️ This repository provides training recipes for the AMD efficient text-to-video models, which are designed for high performance and efficiency. The training process includes two key steps: * Distillation and Pruning: We distill and prune the popular text-to-video model [VideoCrafter2](https://github.com/AILab-CVC/VideoCrafter), reducing the parameters to a compact 945M while maintaining competitive performance. * Optimization with T2V-Turbo: We apply the [T2V-Turbo](https://github.com/Ji4chenLi/t2v-turbo) method on the distilled model to reduce inference steps and further enhance model quality. This implementation is released to promote further research and innovation in the field of efficient text-to-video generation, optimized for AMD Instinct accelerators. ![pic](GIFs/vbench.png "Vbench performance") **8-Steps Results**

A cute happy Corgi playing in park, sunset, pixel.	A cute happy Corgi playing in park, sunset, animated style.gif	A cute raccoon playing guitar in the beach.	A cute raccoon playing guitar in the forest.

A quiet beach at dawn and the waves gently lapping.	A cute teddy bear, dressed in a red silk outfit, stands in a vibrant street, Chinese New Year.	A sandcastle being eroded by the incoming tide.	An astronaut flying in space, in cyberpunk style.

A cat DJ at a party.	A 3D model of a 1800s victorian house.	A drone flying over a snowy forest.	A ghost ship navigating through a sea under a moon.