Xet Team

company
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

jsulzΒ  new activity 7 days ago
xet-team/README:xet repositories
seansesΒ  new activity 7 days ago
xet-team/README:xet repositories
jsulzΒ  updated a Space 16 days ago
xet-team/quantization-dedup
View all activity

Articles

xet-team's activity

jsulzΒ 
in xet-team/README 7 days ago

xet repositories

2
#1 opened 7 days ago by
Tonic
seansesΒ 
in xet-team/README 7 days ago

xet repositories

2
#1 opened 7 days ago by
Tonic
jsulzΒ 
posted an update 8 days ago
view post
Post
3300
Time flies!

Six months after joining Hugging Face the Xet team is kicking off the first migrations from LFS to our storage for a number of repositories on the Hub.

More on the nitty gritty details behind the migration soon, but here are the big takeaways:

πŸ€– We've successfully completed the first migrations from LFS -> Xet to test the infrastructure and prepare for a wider release

βœ… No action on your part needed - you can work with a Xet-backed repo like any other repo on the Hub (for now - major improvements on their way!)

πŸ‘€ Keep an eye out for the Xet logo to see if a repo you know is on our infra! See the screenshots below to spot the difference πŸ‘‡

⏩ ⏩ ⏩ Blazing uploads and downloads coming soon. W’re gearing up for a full integration with the Hub's Python library that will make building on the Hub faster than ever - special thanks to @celinah and @Wauplin for their assistance.

πŸŽ‰ Want Early Access? If you’re curious and want to test it out the bleeding edge that will power the development experience on the Hub, we’d love to partner with you. Let me know!

This is the culmination of a lot of effort from the entire team. Big round of applause to @sirahd @brianronan @jgodlewski @hoytak @seanses @assafvayner @znation @saba9 @rajatarya @port8080 @yuchenglow
  • 1 reply
Β·
jsulzΒ 
updated a Space 16 days ago
jsulzΒ 
posted an update 16 days ago
view post
Post
3045
Toward the end of last year, the Xet team provided an inside look into the foundations of how we plan to enable rapid experimentation and iteration for the AI builders on the Hub: https://huggingface.co./blog/from-files-to-chunks

But it turns out chunks aren't all you need!

Our goal is to bring:
πŸš€ Faster uploads
⏬ Speedy downloads
πŸ’ͺ All without sacrificing your workflow

To do that, we need the infrastructure and system and design to back it up. As we prepare to roll out the first Xet-backed repositories on the Hub, we wrote up a post explaining the nitty gritty details of the decisions that bring this to life https://huggingface.co./blog/from-chunks-to-blocks

Complete with an interactive visualization that shows the power of deduplication in action - taking a 191GB repo to ~97GB and shaving a few hours off upload speeds.

The darker each block in the heatmap, the more we dedupe, the less we have to transfer. Clicking on a file's blocks shows all other files that share blocks.

Check it out and explore for yourself! xet-team/quantization-dedup
jsulzΒ 
published a Space 18 days ago
jsulzΒ 
posted an update 3 months ago
view post
Post
1416
Doing a lot of benchmarking and visualization work, which means I'm always searching for interesting repos in terms of file types, size, branches, and overall structure.

To help, I built a Space jsulz/repo-info that lets you search for any repo and get back:

- Treemap of the repository, color coded by file/directory size
- Repo branches and their size
- Cumulative size of different file types (e.g., the total size of all the safetensors in the repo)

And because I'm interested in how this will fit in our work to leverage content-defined chunking for versioning repos on the Hub
- https://huggingface.co./blog/from-files-to-chunks - everything has the number of chunks (1 chunk = 64KB) as well as the total size in bytes.

Some of the treemaps are pretty cool. Attached are black-forest-labs/FLUX.1-dev and for fun laion/laion-audio-preview (which has nearly 10k .tar files 🀯)

  • 2 replies
Β·
jsulzΒ 
posted an update 3 months ago
view post
Post
1582
Something I love about working at Hugging Face is the opportunity to design and work in public. Right now, we’re redesigning the architecture that supports uploads and downloads on the Hub.

Datasets and models are growing fast, and so are the challenges of storing and transferring them efficiently. To keep up, we're introducing a new protocol for uploads and downloads, supported by a content-addressed store (CAS).

Here’s what’s coming:

πŸ“¦ Smarter uploads: Chunk-level management enables advanced deduplication, compression, and reduces redundant transfers, speeding up uploads.
⚑ Efficient downloads: High throughput and low latency ensure fast access, even during high-demand model releases.
πŸ”’ Enhanced security: Validate uploads before storage to block malicious or invalid data.

We analyzed 24 hours of global upload activity in October (88 countries, 130TB of data!) to design a system that scales with your needs.

The result? A proposed infrastructure with CAS nodes in us-east-1, eu-west-3, and ap-southeast-1.

πŸ”— Read the blog post for the full details: https://huggingface.co./blog/rearchitecting-uploads-and-downloads

🌟 Check out our interactive demo to explore the data yourself!
xet-team/cas-analysis

We’d love to hear your feedback - let us know if you have questions or want to see more.
Β·
jsulzΒ 
posted an update 3 months ago
view post
Post
2940
When the XetHub crew joined Hugging Face this fall, @erinys and I started brainstorming how to share our work to replace Git LFS on the Hub. Uploading and downloading large models and datasets takes precious time. That’s where our chunk-based approach comes in.

Instead of versioning files (like Git and Git LFS), we version variable-sized chunks of data. For the Hugging Face community, this means:

⏩ Only upload the chunks that changed.
πŸš€ Download just the updates, not the whole file.
🧠 We store your file as deduplicated chunks

In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x, but this isn’t just a performance boost. It’s a rethinking of how we manage models and datasets on the Hub.

We're planning on our new storage backend to the Hub in early 2025 - check out our blog to dive deeper, and let us know: how could this improve your workflows?

https://huggingface.co./blog/from-files-to-chunks
erinysΒ 
posted an update 4 months ago
jsulzΒ 
posted an update 5 months ago
view post
Post
1665
The Hugging Face Hub hosts over 1.5M Model, Dataset, and Space repositories. To scale to 10M+, the XetHub team (https://huggingface.co./xet-team) is replacing Git LFS with a new technology that improves storage and transfer capabilities with some future developer experience benefits to boot.

Thanks to @yuchenglow and @port8080 (for their analysis covering LFS usage from March 2022–Sept 2024), we now have insights into what we’re storing. Check out the Gradio app to explore:
- Storage growth over time
- File types over all repositories
- Some simple optimizations we're investigating

xet-team/lfs-analysis
erinysΒ 
posted an update 5 months ago
view post
Post
1980
We shut down XetHub today after almost 2 years. What we learned from launching our Git-scaled product from scratch:
- Don't make me change my workflow
- Data inertia is real
- ML best practices are still evolving

Closing the door on our public product lets us focus on our new goal of scaling HF Hub's storage backend to improve devX for a larger community. We'd love to hear your thoughts on what experiences we can improve!

Read the full post: https://xethub.com/blog/shutting-down-xethub-learnings-and-takeaways
Β·
erinysΒ 
posted an update 5 months ago
view post
Post
1381
We did a thing! Eight weeks into our Hugging Face tenure, we can demo a round-trip of Xet-backed files from our local machine to a prod Hugging Face S3 bucket and back. πŸš€

It’s been exciting to dive into how the Hub is built and design our steel thread through the infrastructure. Now that the thread is up, we can kick off project Capacious Extremis πŸͺ„ to add all the other goodies: authentication, authorization, deduplication, privacy, and more.

What does this mean for you? You’re one step closer to ⚑ faster downloads, uploads, and iterative development on Hugging Face Hub!
This is our first step toward replacing Git LFS as the Hub's storage backend: https://huggingface.co./blog/xethub-joins-hf

Check out the demo on LinkedIn to see the transfer in action: https://www.linkedin.com/posts/annux_youve-heard-of-blue-steel-but-have-activity-7245062126535405568-3cvJ
jsulzΒ 
posted an update 5 months ago
view post
Post
2100
In August, the XetHub team joined Hugging Face
- https://huggingface.co./blog/xethub-joins-hf - and we’ve been rolling up our sleeves to bring the best of both worlds together. We started with a deep dive into the current state of files stored with Git LFS on the Hub.

Getting this information was no small feat. We had to:
* Analyze a complete database dump of all repositories and files stored in Git LFS across Hugging Face.
* Parse through metadata on file sizes and types to accurately map the storage breakdown across Spaces, Models, and Datasets.

You can read more about the findings (with some jaw-dropping stats + charts) here https://www.linkedin.com/feed/update/urn:li:activity:7244486280351285248