MiniMaxAI/MiniMax-VL-01
Image-Text-to-Text
•
Updated
•
1.84k
•
220
what if we segment the audio first and then transcribe tho its some extra compute to throw in but imo it would resul tin better result !