giant-oak/lsg-roberta-base-4096
Fill-Mask
•
Updated
•
16
Various efficient attention encoder-style architectures distilled into student models with half the hidden layers, plus a long-context NER dataset