Hugging Face

Enterprise
company
Verified
Activity Feed

AI & ML interests

The AI community building the future.

Recent Activity

nielsrย  updated a dataset about 17 hours ago
huggingface/community-science-merged
lysandreย  updated a dataset 1 day ago
huggingface/transformers-metadata
alozowskiย  updated a dataset 1 day ago
huggingface/documentation-images
View all activity

huggingface's activity

MoritzLaurerย 
posted an update about 12 hours ago
view post
Post
408
FACTS is a great paper from @GoogleDeepMind on measuring the factuality of LLM outputs. You can now download their prompt templates from @huggingface to improve LLM-based fact-checking yourself!

๐Ÿ“ The paper introduces the FACTS Grounding benchmark for evaluating the factuality of LLM outputs.

๐Ÿค– Fact-checking is automated by an ensemble of LLM judges that verify if a response is fully grounded in a factual reference document.

๐Ÿงช The authors tested different prompt templates on held-out data to ensure their generalization.

๐Ÿ“š It's highly educational to read these templates to learn how frontier labs design prompts and understand their limitations.

๐Ÿ’พ You can now download and reuse these prompt templates via the prompt-templates library!

๐Ÿ”„ The library simplifies sharing prompt templates on the HF hub or locally via standardized YAML files. Letโ€™s make LLM work more transparent and reproducible by sharing more templates like this!

Links ๐Ÿ‘‡
- prompt-templates docs: https://moritzlaurer.github.io/prompt_templates/
- all templates on the HF Hub: MoritzLaurer/facts-grounding-prompts
- FACTS paper: https://storage.googleapis.com/deepmind-media/FACTS/FACTS_grounding_paper.pdf
merveย 
posted an update 1 day ago
view post
Post
1803
What a beginning to this year in open ML ๐Ÿค 
Let's unwrap! merve/jan-10-releases-677fe34177759de0edfc9714

Multimodal ๐Ÿ–ผ๏ธ
> ByteDance released SA2VA: a family of vision LMs that can take image, video, text and visual prompts
> moondream2 is out with new capabilities like outputting structured data and gaze detection!
> Dataset: Alibaba DAMO lab released multimodal textbook โ€” 22k hours worth of samples from instruction videos ๐Ÿคฏ
> Dataset: SciCap captioning on scientific documents benchmark dataset is released along with the challenge!

LLMs ๐Ÿ’ฌ
> Microsoft released Phi-4, sota open-source 14B language model ๐Ÿ”ฅ
> Dolphin is back with Dolphin 3.0 Llama 3.1 8B ๐Ÿฌ๐Ÿฌ
> Prime-RL released Eurus-2-7B-PRIME a new language model trained using PRIME alignment
> SmallThinker-3B is a new small reasoning LM based on Owen2.5-3B-Instruct ๐Ÿ’ญ
> Dataset: QWQ-LONGCOT-500K is the dataset used to train SmallThinker, generated using QwQ-32B-preview ๐Ÿ“•
> Dataset: @cfahlgren1 released React Code Instructions: a dataset of code instruction-code pairs ๐Ÿ“•
> Dataset: Qwen team is on the roll, they just released CodeElo, a dataset of code preferences ๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป

Embeddings ๐Ÿ”–
> @MoritzLaurer released zero-shot version of ModernBERT large ๐Ÿ‘
> KaLM is a new family of performant multilingual embedding models with MIT license built using Qwen2-0.5B

Image/Video Generation โฏ๏ธ
> NVIDIA released Cosmos, a new family of diffusion/autoregressive World Foundation Models generating worlds from images, videos and texts ๐Ÿ”ฅ
> Adobe released TransPixar: a new text-to-video model that can generate assets with transparent backgrounds (a first!)
> Dataset: fal released cosmos-openvid-1m Cosmos-tokenized OpenVid-1M with samples from OpenVid-1M

Others
> Prior Labs released TabPFNv2, the best tabular transformer is out for classification and regression
> Metagene-1 is a new RNA language model that can be used for pathogen detection, zero-shot embedding and genome understanding
AdinaYย 
posted an update 1 day ago
davanstrienย 
posted an update 1 day ago
view post
Post
1296
The data-is-better-together/fineweb-c dataset is growing!

This week a few more languages have got 1,000 annotations for the educational quality of data from HuggingFaceFW/fineweb-2.

Why should you care?

The quality of pre-training data can have a big impact on the performance of downstream language models trained on that data ( HuggingFaceFW/blogpost-fineweb-v1).

Being able to filter by educational quality is on way of improving the quality of the data you use for training an LLM. Very importantly this approach can also reduce the amount of data needed for pertaining.

Why not use an LLM?

LLMs can be used to annotate educational quality for a subset of data. This data can then be used to train a smaller encoder only model to label the full dataset. However, this may not work well for languages outside of english. This is where fineweb-c (community) comes in.

The community is annotating the educational quality of fineweb2 data. Currently 114 languages have some annotations. These annotations will enable a number of things:

- Evaluate whether an LLM can label the educational quality for texts in that language well
- Directly be used for training quality classifiers
- Help discover other rules and huerisitcs for refining fineweb2 further for different languages.

This week the following languages where done:

Swedish thanks to: @Lauler @AntonVic @ohallstrom @bjarlestam @menbom @Ekgren @apsod

Ukrainian thanks to: @hannayukhymenko @robinhad @realPivo @RabotiahovDmytro @reciprocate

Assamese thanks to: @moyoor97 @Arpanjyoti @nawaf-helmi123 @pahigogoi1 @aelhence @kishorekashyap

Want to learn more: https://huggingface.co./blog/davanstrien/fineweb2-community

Contribute yourself here: data-is-better-together/fineweb-c
  • 1 reply
ยท

Add images

#414 opened 1 day ago by
tomaarsen

Add images

#413 opened 1 day ago by
tomaarsen
MoritzLaurerย 
posted an update 2 days ago
view post
Post
1482
The TRL v0.13 release is ๐Ÿ”ฅ! My highlight are the new process reward trainer to train models similar to o1 and tool call support:

๐Ÿง  Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning.

๐Ÿ”€ Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub.

๐Ÿ› ๏ธ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts.

โš–๏ธ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation.

Read the release notes and other resources here ๐Ÿ‘‡
Release: https://github.com/huggingface/trl/releases/tag/v0.13.0
Mergekit: https://github.com/arcee-ai/mergekit
Mixture of judges paper: The Perfect Blend: Redefining RLHF with Mixture of Judges (2409.20370)
merveย 
posted an update 2 days ago
view post
Post
1535
ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license ๐Ÿ’— ByteDance/sa2va-model-zoo-677e3084d71b5f108d00e093

> The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos โฏ๏ธ

> The models come in 1B, 4B and 8B and are based on InternVL2.5 for base architecture and Qwen2, Qwen2.5 and InternLM2 for language model part (depending on the checkpoint)

> The model is very interesting, it has different encoders for different modalities each (visual prompt, text prompt, image and video) then it concatenates these to feed into LLM ๐Ÿ’ฌ

the output segmentation tokens are passed to SAM2, to sort of match text (captions or semantic classes) to masks โคต๏ธ

> Their annotation pipeline is also interesting, they seems to use two open large vision LMs to refine the annotations, and have different levels of descriptions to provide consistency.
  • 1 reply
ยท
andrewrreedย 
posted an update 4 days ago
view post
Post
2532
๐Ÿš€ Supercharge your LLM apps with Langfuse on Hugging Face Spaces!

Langfuse brings end-to-end observability and tooling to accelerate your dev workflow from experiments through production

Now available as a Docker Space directly on the HF Hub! ๐Ÿค—

๐Ÿ” Trace everything: monitor LLM calls, retrieval, and agent actions with popular frameworks
1โƒฃ One-click deployment: on Spaces with persistent storage and integrated OAuth
๐Ÿ›  Simple Prompt Management: Version, edit, and update without redeployment
โœ… Intuitive Evals: Collect user feedback, run model/prompt evaluations, and improve quality
๐Ÿ“Š Dataset Creation: Build datasets directly from production data to enhance future performance

Kudos to the Langfuse team for this collab and the awesome, open-first product theyโ€™re building! ๐Ÿ‘ @marcklingen @Clemo @MJannik

๐Ÿ”— Space: langfuse/langfuse-template-space
๐Ÿ”— Docs: https://huggingface.co./docs/hub/spaces-sdks-docker-langfuse
  • 1 reply
ยท
m-ricย 
posted an update 4 days ago
view post
Post
4749
Since I published it on GitHub a few days ago,
Hugging Face's new agentic library ๐˜€๐—บ๐—ผ๐—น๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐˜€ has gathered nearly 4k stars ๐Ÿคฏ

โžก๏ธ But we are just getting started on agents: so we are hiring an ML Engineer to join me and double down on this effort!

The plan is to build GUI agents: agents that can act on your computer with mouse & keyboard, like Claude Computer Use.

We will make it work better, and fully open. โœจ

Sounds like something you'd like to do? Apply here ๐Ÿ‘‰ https://apply.workable.com/huggingface/j/AF1D4E3FEB/
ยท
MoritzLaurerย 
posted an update 4 days ago
view post
Post
2004
OpenAI is losing money on the $200/month subscription ๐Ÿคฏ. It's crazy how expensive it is to run these largest LLMs:

- ChatGPT Pro costs $200/month ($2,400/year) and is still unprofitable for OpenAI due to higher-than-expected usage.
- OpenAI reportedly expected losses of about $5 billion on revenue of $3.7 billion last year, with ChatGPT alone once costing an estimated $700,000 per day to operate. ๐Ÿ’ธ๐Ÿ”ฅ
- They build strong models and do great research. Whether this business model will work in the long run is one of the biggest questions in the AI economy today.

Source with the numbers ๐Ÿ‘‡
https://techcrunch.com/2025/01/05/openai-is-losing-money-on-its-pricey-chatgpt-pro-plan-ceo-sam-altman-says/
ยท