view article Article Optimizing Pretraining Data Mixes with LLM-Estimated Utility By WillHeld • Jan 22 • 3