patch inference on CPU & Windows + Update README snippets
#2
by
tomaarsen
HF staff
- opened
Hello!
Pull Request overview
- Remove
reference_compile
config option. When not specified in the config, it will be set dynamically based on the user's hardware and software: https://github.com/huggingface/transformers/blob/f439e28d32c9fa061c4fd90696ba0b158d273d09/src/transformers/models/modernbert/modeling_modernbert.py#L689-L718 - Update the README:
- Add tag for Sentence Transformers to boost visibility
- Add model outputs so people get a better feel for what the model does
- Remove 'trust_remote_code', not needed for ModernBERT!
- Update minimum 'transformers' to v4.48.0, as that version introduced the
modernbert
architecture. - Mention that
flash_attn
is recommended (but not required) for faster inference.
Details
Regarding the reference_compile
config change: if that isn't done, then parts of the model are always compiled, even if the user does not have triton
(a core requirement for compilation) or if they are running on CPU (which isn't compatible with compilation). Removing the option will help.
- Tom Aarsen
tomaarsen
changed pull request status to
open
thenlper
changed pull request status to
merged