This year, we started our “AI Agents and Agentic Workflows” series (https://www.turingpost.com/t/AI-Agents) to explore everything about AI agents step by step: all the vocabulary, how they work, and how to build them. The huge interest in this series and the large number of studies conducted on agents showed that it was one of the most popular and important themes of the year. In 2025, most likely, agents will reach new highs – we will be covering that for you. Now, let’s review the agentic systems that have emerged this year.
Here is a list of 15 agentic systems and frameworks of 2024:
Can we please do something about this? It makes everything I do so much harder, and because my local machine is so terrible, I am forced to test in production. This makes debugging so difficult. nroggendorff/system-exit
Introducing 📐𝐅𝐢𝐧𝐞𝐌𝐚𝐭𝐡: the best public math pre-training dataset with 50B+ tokens! HuggingFaceTB/finemath
Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.
We build the dataset by: 🛠️ carefully extracting math data from Common Crawl; 🔎 iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.
We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.
We hope this helps advance the performance of LLMs on math and reasoning! 🚀 We’re also releasing all the ablation models as well as the evaluation code.
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: 𝗪𝗲𝗹𝗰𝗼𝗺𝗲 𝗠𝗼𝗱𝗲𝗿𝗻𝗕𝗘𝗥𝗧! 🤗
We talk a lot about ✨Generative AI✨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models.
The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs).
It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs.
Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub.
➡️ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT.
𝗧𝗟;𝗗𝗥: 🏛️ Architecture changes: ⇒ First, standard modernizations: - Rotary positional embeddings (RoPE) - Replace GeLU with GeGLU, - Use Flash Attention 2 ✨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead.
🥇 As a result, the model tops the game of encoder models: It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster!
a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more
Introducing the Synthetic Data Generator, a user-friendly application that takes a no-code approach to creating custom datasets with Large Language Models (LLMs). The best part: A simple step-by-step process, making dataset creation a non-technical breeze, allowing anyone to create datasets and models in minutes and without any code.
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥
How? By combining step-wise reward models with tree search algorithms :)
We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"
We're open sourcing the full recipe and sharing a detailed blog post.
In our blog post we cover:
📈 Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.
🎄 Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.
🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM
We enable large language models to generate and understand 3D meshes by representing them as text and fine-tuning. This unifies the 3D and text modalities in a single model and preserves language abilities, unlocking conversational 3D creation with mesh understanding.
AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here: OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.