dleemiller
/

ModernCE-large-nli

+---
+license: mit
+datasets:
+- nyu-mll/multi_nli
+- stanfordnlp/snli
+language:
+- en
+metrics:
+- accuracy
+base_model:
+- answerdotai/ModernBERT-large
+- tasksource/ModernBERT-large-nli
+pipeline_tag: text-classification
+library_name: sentence-transformers
+tags:
+- cross-encoder
+- modernbert
+- mnli
+- snli
+---
+# ModernBERT Cross-Encoder: Natural Language Inference (NLI)
+This cross encoder performs sequence classification for contradiction/neutral/entailment labels. This has
+drop-in compatibility with comparable sentence transformers cross encoders.
+I trained this model by initializaing the ModernBERT-large weights from the brilliant `tasksource/ModernBERT-large-nli`
+zero-shot classification model. Then I trained it with a batch size of 64 using the `sentence-transformers` AllNLI
+dataset.
+For the `large` version, I froze all layers initialized from the tasksource model up to 19, and fine tuned only the
+remaining layers with a new classification head.
+---
+## Features
+- **High performing:** Achieves 92.02% and 91.10% on MNLI mismatched and SNLI test.
+- **Efficient architecture:** Based on the ModernBERT-large design (395M parameters), offering faster inference speeds.
+- **Extended context length:** Processes sequences up to 8192 tokens, great for LLM output evals.
+---
+## Performance
+| Model                     | MNLI Mismatched   | SNLI Test    | Context Length |
+|---------------------------|-------------------|--------------|----------------|
+| `ModernCE-large-nli`      | 0.9202            | 0.9110       | 8192           |
+| `ModernCE-base-nli`       | 0.9034            | 0.9025       | 8192           |
+| `deberta-v3-large`        | 0.9049            | 0.9220       | 512            |
+| `deberta-v3-base`         | 0.9004            | 0.9234       | 512            |
+---
+## Usage
+To use ModernCE for NLI tasks, you can load the model with the Hugging Face `sentence-transformers` library:
+```python
+from sentence_transformers import CrossEncoder
+# Load ModernCE model
+model = CrossEncoder("dleemiller/ModernCE-large-nli")
+scores = model.predict([
+    ('A man is eating pizza', 'A man eats something'),
+    ('A black race car starts up in front of a crowd of people.', 'A man is driving down a lonely road.')
+])
+# Convert scores to labels
+label_mapping = ['contradiction', 'entailment', 'neutral']
+labels = [label_mapping[score_max] for score_max in scores.argmax(axis=1)]
+# ['entailment', 'contradiction']
+```
+---
+## Training Details
+### Pretraining
+We initialize the `tasksource/ModernBERT-large` weights.
+Details:
+- Batch size: 64
+- Learning rate: 3e-4
+- **Attention Dropout:** attention dropout 0.1
+### Fine-Tuning
+Fine-tuning was performed on the SBERT AllNLI.tsv.gz dataset.
+### Validation Results
+The model achieved the following test set performance after fine-tuning:
+- **MNLI Unmatched:** 0.9034
+- **SNLI:** 0.9025
+---
+## Model Card
+- **Architecture:** ModernBERT-large
+- **Fine-Tuning Data:** `sentence-transformers` - AllNLI.tsv.gz
+---
+## Thank You
+Thanks to the AnswerAI team for providing the ModernBERT models, and the Sentence Transformers team for their leadership in transformer encoder models.
+We also thank the tasksource team for their work on zeroshot encoder models.
+---
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{moderncenli2025,
+  author = {Miller, D. Lee},
+  title = {ModernCE NLI: An NLI cross encoder model},
+  year = {2025},
+  publisher = {Hugging Face Hub},
+  url = {https://huggingface.co/dleemiller/ModernCE-large-nli},
+}
+```
+---
+## License
+This model is licensed under the [MIT License](LICENSE).