Complete Sentence Transformers integration + patch inference on CPU & Windows
Hello!
Preface
I work on embedding models on a daily basis, and I worked hard to get ModernBERT integrated & supported nicely in transformers
, and your work here is very much a culmination of those 2 coming together. It's wonderful to see such huge advancements for its model size; it really validates our work. The ModernBERT team is very excited about this model & the reranker.
Great work!
Pull Request overview
- Complete Sentence Transformers integration:
1_Pooling/config.json
already existed, butmodules.json
was missing to tell Sentence Transformers to look in1_Pooling/config.json
. - Remove
reference_compile
config option. When not specified in the config, it will be set dynamically based on the user's hardware and software: https://github.com/huggingface/transformers/blob/f439e28d32c9fa061c4fd90696ba0b158d273d09/src/transformers/models/modernbert/modeling_modernbert.py#L689-L718 - Update the README:
- Add tag for Sentence Transformers to boost visibility
- Add model outputs so people get a better feel for what the model does
- Remove 'trust_remote_code', not needed for ModernBERT!
- Update minimum 'transformers' to v4.48.0, as that version introduced the
modernbert
architecture. - Mention that
flash_attn
is recommended (but not required) for faster inference.
Details
Regarding the reference_compile
config change: if that isn't done, then parts of the model are always compiled, even if the user does not have triton
(a core requirement for compilation) or if they are running on CPU (which isn't compatible with compilation). Removing the option will help.
P.s. will you upload your MTEB scores to the metadata? I'd love to see this in MTEB.
- Tom Aarsen
Hi Tom,
Thank you so much for your kind words and contributions to the gte-modernbert series models. It's truly gratifying to hear that your hard work on ModernBERT. We are equally thrilled about the progress and the successful integration with transformers
.
Best regards,
Dingkun Long