Detect text origin as human or machine-generated
Generates a sound effect that matches video shot
Generate realistic audio from text