We’ve just released the results from a large-scale human evaluation (400k annotations) of OpenGVLab’s newest text-to-image model, Lumina. Surprisingly, Lumina outperforms OpenAI’s DALL-E 3 in terms of alignment, although it ranks #6 in our overall human preference benchmark.
To support further development in text-to-image models, we’re making our entire human-annotated dataset publicly available. If you’re working on model improvements and need high-quality data, feel free to explore.
We welcome your feedback and look forward to any insights you might share!
This week we are releasing the first framework unit in the course and it’s on smolagents. This is what the unit covers:
- why should you use smolagents vs another library? - how to build agents that use code - build multiagents systems - use vision language models for browser use
The team has been working flat out on this for a few weeks. Led by @sergiopaniego and supported by smolagents author @m-ric.