A unified multimodal understanding and generation model.
https://huggingface.co./papers/2501.03006
Generate realistic voice audio from text and audio prompts
Optical illusions and style transfer with FLUX