AI & ML interests

None defined yet.

course-demos's activity

lewtunย 
posted an update about 13 hours ago
view post
Post
829
We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!

๐Ÿงช Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.

๐Ÿง  Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.

๐Ÿ”ฅ Step 3: show we can go from base model -> SFT -> RL via multi-stage training.

Follow along: https://github.com/huggingface/open-r1
lewtunย 
posted an update 20 days ago
view post
Post
3381
I was initially pretty sceptical about Meta's Coconut paper [1] because the largest perf gains were reported on toy linguistic problems. However, these results on machine translation are pretty impressive!

https://x.com/casper_hansen_/status/1875872309996855343

Together with the recent PRIME method [2] for scaling RL, reasoning for open models is looking pretty exciting for 2025!

[1] Training Large Language Models to Reason in a Continuous Latent Space (2412.06769)
[2] https://huggingface.co./blog/ganqu/prime
lewtunย 
posted an update 27 days ago
view post
Post
2218
This paper ( HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs (2412.18925)) has a really interesting recipe for inducing o1-like behaviour in Llama models:

* Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting.
* Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases)
* Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1
* Use the resulting data for SFT & RL
* Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement.

Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!
  • 1 reply
ยท
lewtunย 
posted an update about 1 month ago
view post
Post
6777
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute ๐Ÿ”ฅ

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

๐Ÿ“ˆ Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

๐ŸŽ„ Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

๐Ÿงญ Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!
  • 2 replies
ยท
abidlabsย 
posted an update 4 months ago
view post
Post
5051
๐Ÿ‘‹ Hi Gradio community,

I'm excited to share that Gradio 5 will launch in October with improvements across security, performance, SEO, design (see the screenshot for Gradio 4 vs. Gradio 5), and user experience, making Gradio a mature framework for web-based ML applications.

Gradio 5 is currently in beta, so if you'd like to try it out early, please refer to the instructions below:

---------- Installation -------------

Gradio 5 depends on Python 3.10 or higher, so if you are running Gradio locally, please ensure that you have Python 3.10 or higher, or download it here: https://www.python.org/downloads/

* Locally: If you are running gradio locally, simply install the release candidate with pip install gradio --pre
* Spaces: If you would like to update an existing gradio Space to use Gradio 5, you can simply update the sdk_version to be 5.0.0b3 in the README.md file on Spaces.

In most cases, thatโ€™s all you have to do to run Gradio 5.0. If you start your Gradio application, you should see your Gradio app running, with a fresh new UI.

-----------------------------

Fore more information, please see: https://github.com/gradio-app/gradio/issues/9463
  • 2 replies
ยท
abidlabsย 
posted an update 8 months ago
view post
Post
4498
๐—ฃ๐—ฟ๐—ผ๐˜๐—ผ๐˜๐˜†๐—ฝ๐—ถ๐—ป๐—ด holds an important place in machine learning. But it has traditionally been quite difficult to go from prototype code to production-ready APIs

We're working on making that a lot easier with ๐—š๐—ฟ๐—ฎ๐—ฑ๐—ถ๐—ผ and will unveil something new on June 6th: https://www.youtube.com/watch?v=44vi31hehw4&ab_channel=HuggingFace
  • 2 replies
ยท
abidlabsย 
posted an update 9 months ago
view post
Post
3627
Open Models vs. Closed APIs for Software Engineers
-----------------------------------------------------------------------

If you're an ML researcher / scientist, you probably don't need much convincing to use open models instead of closed APIs -- open models give you reproducibility and let you deeply investigate the model's behavior.

But what if you are a software engineer building products on top of LLMs? I'd argue that open models are a much better option even if you are using them as APIs. For at least 3 reasons:

1) The most obvious reason is reliability of your product. Relying on a closed API means that your product has a single point-of-failure. On the other hand, there are at least 7 different API providers that offer Llama3 70B already. As well as libraries that abstract on top of these API providers so that you can make a single request that goes to different API providers depending on availability / latency.

2) Another benefit is eventual consistency going local. If your product takes off, it will be more economical and lower latency to have a dedicated inference endpoint running on your VPC than to call external APIs. If you've started with an open-source model, you can always deploy the same model locally. You don't need to modify prompts or change any surrounding logic to get consistent behavior. Minimize your technical debt from the beginning.

3) Finally, open models give you much more flexibility. Even if you keep using APIs, you might want to tradeoff latency vs. cost, or use APIs that support batches of inputs, etc. Because different API providers have different infrastructure, you can use the API provider that makes the most sense for your product -- or you can even use multiple API providers for different users (free vs. paid) or different parts of your product (priority features vs. nice-to-haves)
lewtunย 
posted an update 10 months ago
view post
Post
5062
Introducing Zephyr 141B-A35B ๐Ÿช:

HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1

Yesterday, Mistral released their latest base model (via magnet link of course ๐Ÿ˜…) and the community quickly converted it to transformers format and pushed it to the Hub: mistral-community/Mixtral-8x22B-v0.1

Early evals of this model looked extremely strong, so we teamed up with Argilla and KAIST AI to cook up a Zephyr recipe with a few new alignment techniques that came out recently:

๐Ÿง‘โ€๐Ÿณ Align the base model with Odds Ratio Preference Optimisation (ORPO). This novel algorithm developed by @JW17 and @nlee-208 and @j6mes and does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO.

๐Ÿฆซ Use a brand new dataset of 7k high-quality, multi-turn preferences that has been developed by our friends at Argilla. To create this dataset, they took the excellent Capybara SFT dataset from @LDJnr LDJnr/Capybara and converted it into a preference dataset by augmenting the final turn with responses from new LLMs that were then ranked by GPT-4.

What we find especially neat about this approach is that training on 7k samples only takes ~1.3h on 4 H100 nodes, yet produces a model that is very strong on chat benchmarks like IFEval and BBH.

Kudos to @alvarobartt @JW17 and @nlee-208 for this very nice and fast-paced collab!

For more details on the paper and dataset, checkout our collection: HuggingFaceH4/zephyr-orpo-6617eba2c5c0e2cc3c151524
abidlabsย 
posted an update 10 months ago
view post
Post
3333
Introducing the Gradio API Recorder ๐Ÿช„

Every Gradio app now includes an API recorder that lets you reconstruct your interaction in a Gradio app as code using the Python or JS clients! Our goal is to make Gradio the easiest way to build ML APIs, not just UIs ๐Ÿ”ฅ

ยท
lewtunย 
posted an update 11 months ago
view post
Post
Can we align code generation models to be good at chat without compromising their base capabilities ๐Ÿค”?

This was the question the H4 team asked itself when BigCode released StarCoder2 a bit over a week ago. We knew that code models like deepseek-ai/deepseek-coder-6.7b-instruct and m-a-p/OpenCodeInterpreter-DS-33B get impressive scores on code benchmarks like HumanEval, but they tend to score poorly on chat benchmarks like MT Bench and IFEval. We also knew that the Zephyr recipe we applied to Mistral 7B produced a strong chat model, so we wondered -- could be tweaked to produce a strong coding assistant?

It turns out the answer is yes and I'm happy to share StarChat2, a DPO fine-tune of StarCoder2 15B that scores highly on both HumanEval and MT Bench / IFEval ๐ŸŒŸ!

The most interesting lesson for me was that you get better models by blending in more code/math data than chat during the SFT step - in terms of tokens, we found a ratio of 3:1 worked best.

Anyway, here's a demo of the model, along with all the code and datasets we used to train it:

* Demo: HuggingFaceH4/starchat2-playground
* Collection: HuggingFaceH4/starchat2-15b-65f068417b330fafad751fce
* Recipe: https://github.com/huggingface/alignment-handbook

Hope it's useful to others!
  • 3 replies
ยท
abidlabsย 
posted an update 12 months ago
view post
Post
Necessity is the mother of invention, and of Gradio components.

Sometimes we realize that we need a Gradio component to build a cool application and demo, so we just build it. For example, we just added a new gr.ParamViewer component because we needed it to display information about Python & JavaScript functions in our documentation.

Of course, our users should be able able to do the same thing for their machine learning applications, so that's why Gradio lets you build custom components, and publish them to the world ๐Ÿ”ฅ
abidlabsย 
posted an update 12 months ago
view post
Post
Lots of cool Gradio custom components, but is the most generally useful one I've seen so far: insert a Modal into any Gradio app by using the modal component!

from gradio_modal import Modal

with gr.Blocks() as demo:
    gr.Markdown("### Main Page")
    gr.Textbox("lorem ipsum " * 1000, lines=10)

    with Modal(visible=True) as modal:
        gr.Markdown("# License Agreement")
abidlabsย 
posted an update 12 months ago
view post
Post
Just out: new custom Gradio component specifically designed for code completion models ๐Ÿ”ฅ
  • 1 reply
ยท
abidlabsย 
posted an update 12 months ago
view post
Post
The next version of Gradio will be significantly more efficient (as well as a bit faster) for anyone who uses Gradio's streaming features. Looking at you chatbot developers @oobabooga @pseudotensor :)

The major change that we're making is that when you stream data, Gradio used to send the entire payload at each token. This is generally the most robust way to ensure all the data is correctly transmitted. We've now switched to sending "diffs" --> so at each time step, we automatically compute the diff between the most recent updates and then only send the latest token (or whatever the diff may be). Coupled with the fact that we are now using SSE, which is a more robust communication protocol than WS (SSE will resend packets if there's any drops), we should have the best of both worlds: efficient *and* robust streaming.

Very cool stuff @aliabid94 ! PR: https://github.com/gradio-app/gradio/pull/7102
abidlabsย 
posted an update 12 months ago
abidlabsย 
posted an update 12 months ago
view post
Post
Gradio 4.16 introduces a new flow: you can hide/show Tabs or make them interactive/non-interactive.

Really nice for multi-step machine learning ademos โšก๏ธ
  • 6 replies
ยท
abidlabsย 
posted an update about 1 year ago
view post
Post
โœจ Excited to release gradio 4.16. New features include:

๐Ÿปโ€โ„๏ธ Native support for Polars Dataframe
๐Ÿ–ผ๏ธ Gallery component can be used as an input
โšก Much faster streaming for low-latency chatbots
๐Ÿ“„ Auto generated docs for custom components

... and much more! This is HUGE release, so check out everything else in our changelog: https://github.com/gradio-app/gradio/blob/main/CHANGELOG.md
ยท
abidlabsย 
posted an update about 1 year ago
view post
Post
๐—›๐—ผ๐˜„ ๐˜„๐—ฒ ๐—บ๐—ฎ๐—ฑ๐—ฒ ๐—š๐—ฟ๐—ฎ๐—ฑ๐—ถ๐—ผ ๐—ณ๐—ฎ๐˜€๐˜๐—ฒ๐—ฟ ๐—ฏ๐˜†... ๐˜€๐—น๐—ผ๐˜„๐—ถ๐—ป๐—ด ๐—ถ๐˜ ๐—ฑ๐—ผ๐˜„๐—ป!

About a month ago, @oobabooga (who built the popular text generation webui) reported an interesting issue to the Gradio team. After upgrading to Gradio 4, @oobabooga noticed that chatbots that streamed very quickly had a lag before their text would show up in the Gradio app.

After some investigation, we determined that the Gradio frontend would receive the updates from the backend immediately, but the browser would lag before rendering the changes on the screen. The main difference between Gradio 3 and Gradio 4 was that we migrated the communication protocol between the backend and frontend from Websockets (WS) to Server-Side Events (SSE), but we couldn't figure out why this would affect the browser's ability to render the streaming updates it was receiving.

After diving deep into browsers events, @aliabid94 and @pngwn made a realization: most browsers treat WS events (specifically the WebSocket.onmessage function) with a lower priority than SSE events (EventSource.onmessage function), which allowed the browser to repaint the window between WS messages. With SSE, the streaming updates would stack up in the browser's event stack and be prioritized over any browser repaint. The browser would eventually clear the stack but it would take some time to go through each update, which produced a lag.

We debated different options, but the solution that we implemented was to introduce throttling: we slowed down how frequently we would push updates to the browser event stack to a maximum rate of 20/sec. Although this seemingly โ€œslowed downโ€ Gradio streaming, it actually would allow browsers to process updates in real-time and provide a much better experience to end users of Gradio apps.

See the PR here: https://github.com/gradio-app/gradio/pull/7084

Kudos to @aliabid94 and @pngwn for the fix, and to @oobabooga and @pseudotensor for helping us test it out!
ยท
abidlabsย 
posted an update about 1 year ago
view post
Post
There's a lot of interest in machine learning models that generate 3D objects, so Gradio now supports previewing STL files natively in the Model3D component. Huge thanks to Monius for the contribution ๐Ÿ”ฅ๐Ÿ”ฅ
  • 2 replies
ยท
sanchit-gandhiย 
updated a Space about 1 year ago