Jared Sulzdorf's picture

Jared Sulzdorf PRO

jsulz

AI & ML interests

Infrastructure, law, policy

Recent Activity

Organizations

Hugging Face's profile picture Spaces Examples's profile picture Blog-explorers's profile picture Journalists on Hugging Face's profile picture Hugging Face Discord Community's profile picture Xet Team's profile picture open/ acc's profile picture

jsulz's activity

upvoted an article 1 day ago
view reply

Great questions :)

For example if a model does not have a key chunk

Two points of clarification might be in order concerning the above point:

  1. Not all files will contain a key chunk; this is purely an optimization.
  2. Key chunks are used for deduplication on the upload path, not for downloads. They allow us to see if a file uploaded to a repository for the first time has any content in the global store. This allows us to deduplicate over the entirety of the storage.

As for your other question:

If we don't store every chunk hash to block hash, how can we download a model file

When downloading a model file, a request is made to our services with the file hash. This is mapped to a list of block subranges. Logically these are chunks, but by storing the offsets we save on the metadata storage and ultimately many offsets will share boundaries inside a block (allowing us to group them together in a response) providing benefits when sending the content back to the client.