1 4 2

谢集

sanaka87

https://horizonwind2004.github.io/

AI & ML interests

Image Generation

Recent Activity

updated a model 17 days ago

sanaka87/3DIS

upvoted a paper about 1 month ago

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

upvoted a paper about 1 month ago

SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

View all activity

Organizations

None yet

sanaka87's activity

updated a model 17 days ago

sanaka87/3DIS

Text-to-Image • Updated 17 days ago • 5

upvoted 2 papers about 1 month ago

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Paper • 2411.07199 • Published Nov 11, 2024 • 47

SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

Paper • 2501.09756 • Published Jan 16 • 19

reacted to m-ric's post with 👀 about 1 month ago

Post

1369

𝗠𝗶𝗻𝗶𝗠𝗮𝘅'𝘀 𝗻𝗲𝘄 𝗠𝗼𝗘 𝗟𝗟𝗠 𝗿𝗲𝗮𝗰𝗵𝗲𝘀 𝗖𝗹𝗮𝘂𝗱𝗲-𝗦𝗼𝗻𝗻𝗲𝘁 𝗹𝗲𝘃𝗲𝗹 𝘄𝗶𝘁𝗵 𝟰𝗠 𝘁𝗼𝗸𝗲𝗻𝘀 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵 💥

This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach.

𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀:

🏗️ MoE with novel hybrid attention:
‣ Mixture of Experts with 456B total parameters (45.9B activated per token)
‣ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers

🏆 Outperforms leading models across benchmarks while offering vastly longer context:
‣ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks
‣ Can efficiently handle 4M token contexts (vs 256K for most other LLMs)

🔬 Technical innovations enable efficient scaling:
‣ Novel expert parallel and tensor parallel strategies cut communication overhead in half
‣ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%)

🎯 Thorough training strategy:
‣ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge!

Overall, not only is the model impressive, but the technical paper is also really interesting! 📝
It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs.

Read it in full here 👉 MiniMax-01: Scaling Foundation Models with Lightning Attention (2501.08313)
Model here, allows commercial use <100M monthly users 👉 MiniMaxAI/MiniMax-Text-01

posted an update about 1 month ago

Post

1751

🚀 Excited to Share Our Latest Work: 3DIS & 3DIS-FLUX for Multi-Instance Layout-to-Image Generation! ❤️❤️❤️

🎨 Daily Paper: 3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering (2501.05131)
🔓 Code is now open source!
🌐 Project Website: https://limuloo.github.io/3DIS/
🏠 GitHub Repository: https://github.com/limuloo/3DIS
📄 3DIS Paper: https://arxiv.org/abs/2410.12669
📄 3DIS-FLUX Tech Report: https://arxiv.org/abs/2501.05131

🔥 Why 3DIS & 3DIS-FLUX?
Current SOTA multi-instance generation methods are typically adapter-based, requiring additional control modules trained on pre-trained models for layout and instance attribute control. However, with the emergence of more powerful models like FLUX and SD3.5, these methods demand constant retraining and extensive resources.

✨ Our Solution: 3DIS
We introduce a decoupled approach that only requires training a low-resolution Layout-to-Depth model to convert layouts into coarse-grained scene depth maps. Leveraging community and company pre-trained models like ControlNet + SAM2, we enable training-free controllable image generation on high-resolution models such as SDXL and FLUX.

🌟 Benefits of Our Decoupled Multi-Instance Generation:
1. Enhanced Control: By constructing scenes using depth maps in the first stage, the model focuses on coarse-grained scene layout, improving control over instance placement.
2. Flexibility & Preservation: The second stage employs training-free rendering methods, allowing seamless integration with various models (e.g., fine-tuned weights, LoRA) while maintaining the generative capabilities of pre-trained models.

Join us in advancing Layout-to-Image Generation! Follow and star our repository to stay updated! ⭐

liked a model about 1 month ago

sanaka87/3DIS

Text-to-Image • Updated 17 days ago • 5

commented a paper about 1 month ago

3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering

Paper • 2501.05131 • Published Jan 9 • 34 •

upvoted a paper about 1 month ago

3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation

Paper • 2410.12669 • Published Oct 16, 2024 • 1

authored 2 papers about 1 month ago

3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation

Paper • 2410.12669 • Published Oct 16, 2024 • 1

3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering

Paper • 2501.05131 • Published Jan 9 • 34

upvoted a paper about 1 month ago

3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering

Paper • 2501.05131 • Published Jan 9 • 34

liked a Space 2 months ago

340

SeedEdit-APP-V1.0

🎨

Generate and edit images from text instructions

replied to Flowerfan's post 12 months ago

reacted to Flowerfan's post with 🤝❤️🤗👍 12 months ago

Post

Multi-Instance Generation Controller: Enjoy complete control over position generation, attribute determination, and count!

code link: https://github.com/limuloo/MIGC
project page: https://migcproject.github.io/

MIGC decouples multi-instance generation into individual single-instance generation subtasks within the cross-attention layer of Stable Diffusion.

Welcome to follow our project and use the code to create anything you imagine!

Please let us know if you have any suggestions!

6 replies

updated a Space over 1 year ago

AI Yuuka

📊