Generate images from text prompts
Generate text using provided prompts
Co-Speech Gesture Video Generation