Jared Sulzdorf's picture

Jared Sulzdorf PRO

jsulz

·

https://www.jsulz.com/

AI & ML interests

Infrastructure, law, policy

Recent Activity

upvoted an article 1 day ago

Common AI Model Formats

commented on their article 3 days ago

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

liked a dataset 3 days ago

Anthropic/EconomicIndex

View all activity

Organizations

jsulz's activity

upvoted an article 1 day ago

Article

Common AI Model Formats

By

•

1 day ago

• 17

commented on From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub 3 days ago

Great questions :)

For example if a model does not have a key chunk

Two points of clarification might be in order concerning the above point:

Not all files will contain a key chunk; this is purely an optimization.
Key chunks are used for deduplication on the upload path, not for downloads. They allow us to see if a file uploaded to a repository for the first time has any content in the global store. This allows us to deduplicate over the entirety of the storage.

As for your other question:

If we don't store every chunk hash to block hash, how can we download a model file

When downloading a model file, a request is made to our services with the file hash. This is mapped to a list of block subranges. Logically these are chunks, but by storing the offsets we save on the metadata storage and ultimately many offsets will share boundaries inside a block (allowing us to group them together in a response) providing benefits when sending the content back to the client.

liked a dataset 3 days ago

Anthropic/EconomicIndex

Viewer • Updated 18 days ago • 3.51k • 6.35k • 177

liked a Space 3 days ago

The Essential AI Toolkit

A curated collection of AI tools for journalists & creators

upvoted a paper 4 days ago

Protecting Human Cognition in the Age of AI

Paper • 2502.12447 • Published 11 days ago • 4

updated a collection 4 days ago

Papers I Have Read

A list of papers that have moved off my reading list • 11 items • Updated 4 days ago

upvoted 9 papers 5 days ago

Navigating Dataset Documentations in AI: A Large-Scale Analysis of Dataset Cards on Hugging Face

Paper • 2401.13822 • Published Jan 24, 2024 • 1

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 53

HuggingFace's Transformers: State-of-the-art Natural Language Processing

Paper • 1910.03771 • Published Oct 9, 2019 • 17

Model Cards for Model Reporting

Paper • 1810.03993 • Published Oct 5, 2018 • 4

Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements

Paper • 2210.01970 • Published Sep 30, 2022 • 12

Datasets: A Community Library for Natural Language Processing

Paper • 2109.02846 • Published Sep 7, 2021 • 13

Machine Learning Operations (MLOps): Overview, Definition, and Architecture

Paper • 2205.02302 • Published May 4, 2022 • 1

Adam: A Method for Stochastic Optimization

Paper • 1412.6980 • Published Dec 22, 2014 • 1

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 17

updated a collection 5 days ago

Papers I Have Read

A list of papers that have moved off my reading list • 11 items • Updated 4 days ago

upvoted a paper 5 days ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 93

updated a collection 5 days ago

Papers (I want) To Read

A list of papers on my reading list. • 16 items • Updated 5 days ago

updated a collection 6 days ago

Papers I Have Read

A list of papers that have moved off my reading list • 11 items • Updated 4 days ago