Generate text based on images and prompts
Huggingface space for JanusFlow-1.3B
Generate depth map from image