๐ New smolagents update: Safer Local Python Execution! ๐ฆพ๐
With the latest release, we've added security checks to the local Python interpreter: every evaluation is now analyzed for dangerous builtins, modules, and functions. ๐
Here's why this matters & what you need to know! ๐งต๐
1๏ธโฃ Why is local execution risky? โ ๏ธ AI agents that run arbitrary Python code can unintentionally (or maliciously) access system files, run unsafe commands, or exfiltrate data.
2๏ธโฃ New Safety Layer in smolagents ๐ก๏ธ We now inspect every return value during execution: โ Allowed: Safe built-in types (e.g., numbers, strings, lists) โ Blocked: Dangerous functions/modules (e.g., os.system, subprocess, exec, shutil)
4๏ธโฃ Security Disclaimer โ ๏ธ ๐จ Despite these improvements, local Python execution is NEVER 100% safe. ๐จ If you need true isolation, use a remote sandboxed executor like Docker or E2B.
5๏ธโฃ The Best Practice: Use Sandboxed Execution ๐ For production-grade AI agents, we strongly recommend running code in a Docker or E2B sandbox to ensure complete isolation.
6๏ธโฃ Upgrade Now & Stay Safe! ๐ Check out the latest smolagents release and start building safer AI agents today.
I was puzzled by the scope of ๐DeepSeek๐ projects, i.e. why they built (then open sourced) so many pieces which are all over their technology stack. Good engineers are minimalists. They build only when they have to.
Then I realized that FP8 should be the main driving force here. So your raw inter-GPU bandwidth is cut in half (H800). But if you compress your data presentation from 16 bits to 8 bits, then the effective throughput of your workload stays unchanged!
The idea is simple but lots of work had to be done. Their v3 technical report will give you a wholistic view (better than reading the code). To summarize, data structure is the foundation to any software. Since FP8 was new and untried, the ecosystem wasn't there. So DeepSeek became the trailblazer. Before cooking your meals, you need to till the land, grow crops, and grind the flour ๐
โ Hosting our own inference was not enough: now the Hub 4 new inference providers: fal, Replicate, SambaNova Systems, & Together AI.
Check model cards on the Hub: you can now, in 1 click, use inference from various providers (cf video demo)
Their inference can also be used through our Inference API client. There, you can use either your custom provider key, or your HF token, then billing will be handled directly on your HF account, as a way to centralize all expenses.
๐ธ Also, PRO users get 2$ inference credits per month!
Multimodal ๐ฌ - We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG ๐ - UI-TARS are new models by ByteDance to unlock agentic GUI control ๐คฏ in 2B, 7B and 72B - Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B - MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context - Dataset: Yale released a new benchmark called MMVU - Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs ๐ - DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! ๐คฏ - Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B - NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio ๐ฃ๏ธ - Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B - TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation โฏ๏ธ - Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux - tencent released Hunyuan3D-2, new 3D asset generation from images
๐ฅ ๐๐ผ๐ผ๐ด๐น๐ฒ ๐ฟ๐ฒ๐น๐ฒ๐ฎ๐๐ฒ๐ ๐๐ฒ๐บ๐ถ๐ป๐ถ ๐ฎ.๐ฌ, ๐๐๐ฎ๐ฟ๐๐ถ๐ป๐ด ๐๐ถ๐๐ต ๐ฎ ๐๐น๐ฎ๐๐ต ๐บ๐ผ๐ฑ๐ฒ๐น ๐๐ต๐ฎ๐ ๐๐๐ฒ๐ฎ๐บ๐ฟ๐ผ๐น๐น๐ ๐๐ฃ๐ง-๐ฐ๐ผ ๐ฎ๐ป๐ฑ ๐๐น๐ฎ๐๐ฑ๐ฒ-๐ฏ.๐ฒ ๐ฆ๐ผ๐ป๐ป๐ฒ๐! And they start a huge effort on agentic capabilities.
๐ The performance improvements are crazy for such a fast model: โฃ Gemini 2.0 Flash outperforms the previous 1.5 Pro model at twice the speed โฃ Now supports both input AND output of images, video, audio and text โฃ Can natively use tools like Google Search and execute code
โก๏ธ If the price is on par with previous Flash iteration ($0.30 / M tokens, to compare with GPT-4o's $1.25) the competition will have a big problem with this 4x cheaper model that gets better benchmarks ๐คฏ
๐ค What about the agentic capabilities?
โฃ Project Astra: A universal AI assistant that can use Google Search, Lens and Maps โฃ Project Mariner: A Chrome extension that can complete complex web tasks (83.5% success rate on WebVoyager benchmark, this is really impressive!) โฃ Jules: An AI coding agent that integrates with GitHub workflows
I'll be eagerly awaiting further news from Google!
Multimodal ๐ผ๏ธ > Google shipped a PaliGemma 2, new iteration of PaliGemma with more sizes: 3B, 10B and 28B, with pre-trained and captioning variants ๐ > OpenGVLab released InternVL2, seven new vision LMs in different sizes, with sota checkpoint with MIT license โจ > Qwen team at Alibaba released the base models of Qwen2VL models with 2B, 7B and 72B ckpts
LLMs ๐ฌ > Meta released a new iteration of Llama 70B, Llama3.2-70B trained further > EuroLLM-9B-Instruct is a new multilingual LLM for European languages with Apache 2.0 license ๐ฅ > Dataset: CohereForAI released GlobalMMLU, multilingual version of MMLU with 42 languages with Apache 2.0 license > Dataset: QwQ-LongCoT-130K is a new dataset to train reasoning models > Dataset: FineWeb2 just landed with multilinguality update! ๐ฅ nearly 8TB pretraining data in many languages!
Image/Video Generation ๐ผ๏ธ > Tencent released HunyuanVideo, a new photorealistic video generation model > OminiControl is a new editing/control framework for image generation models like Flux
Audio ๐ > Indic-Parler-TTS is a new text2speech model made by community
Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! ๐ฅ High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js. ๐ค Try it out yourself!