dleemiller commited on
Commit
01cf0b3
·
verified ·
1 Parent(s): fc6c0ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -3
README.md CHANGED
@@ -1,3 +1,129 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - nyu-mll/multi_nli
5
+ - stanfordnlp/snli
6
+ language:
7
+ - en
8
+ metrics:
9
+ - accuracy
10
+ base_model:
11
+ - answerdotai/ModernBERT-large
12
+ - tasksource/ModernBERT-large-nli
13
+ pipeline_tag: text-classification
14
+ library_name: sentence-transformers
15
+ tags:
16
+ - cross-encoder
17
+ - modernbert
18
+ - mnli
19
+ - snli
20
+ ---
21
+ # ModernBERT Cross-Encoder: Natural Language Inference (NLI)
22
+
23
+ This cross encoder performs sequence classification for contradiction/neutral/entailment labels. This has
24
+ drop-in compatibility with comparable sentence transformers cross encoders.
25
+
26
+ I trained this model by initializaing the ModernBERT-large weights from the brilliant `tasksource/ModernBERT-large-nli`
27
+ zero-shot classification model. Then I trained it with a batch size of 64 using the `sentence-transformers` AllNLI
28
+ dataset.
29
+
30
+ For the `large` version, I froze all layers initialized from the tasksource model up to 19, and fine tuned only the
31
+ remaining layers with a new classification head.
32
+
33
+ ---
34
+
35
+ ## Features
36
+ - **High performing:** Achieves 92.02% and 91.10% on MNLI mismatched and SNLI test.
37
+ - **Efficient architecture:** Based on the ModernBERT-large design (395M parameters), offering faster inference speeds.
38
+ - **Extended context length:** Processes sequences up to 8192 tokens, great for LLM output evals.
39
+
40
+ ---
41
+
42
+ ## Performance
43
+
44
+ | Model | MNLI Mismatched | SNLI Test | Context Length |
45
+ |---------------------------|-------------------|--------------|----------------|
46
+ | `ModernCE-large-nli` | 0.9202 | 0.9110 | 8192 |
47
+ | `ModernCE-base-nli` | 0.9034 | 0.9025 | 8192 |
48
+ | `deberta-v3-large` | 0.9049 | 0.9220 | 512 |
49
+ | `deberta-v3-base` | 0.9004 | 0.9234 | 512 |
50
+
51
+
52
+ ---
53
+
54
+ ## Usage
55
+
56
+ To use ModernCE for NLI tasks, you can load the model with the Hugging Face `sentence-transformers` library:
57
+
58
+ ```python
59
+ from sentence_transformers import CrossEncoder
60
+
61
+ # Load ModernCE model
62
+ model = CrossEncoder("dleemiller/ModernCE-large-nli")
63
+
64
+ scores = model.predict([
65
+ ('A man is eating pizza', 'A man eats something'),
66
+ ('A black race car starts up in front of a crowd of people.', 'A man is driving down a lonely road.')
67
+ ])
68
+
69
+ # Convert scores to labels
70
+ label_mapping = ['contradiction', 'entailment', 'neutral']
71
+ labels = [label_mapping[score_max] for score_max in scores.argmax(axis=1)]
72
+ # ['entailment', 'contradiction']
73
+ ```
74
+
75
+ ---
76
+
77
+ ## Training Details
78
+
79
+ ### Pretraining
80
+ We initialize the `tasksource/ModernBERT-large` weights.
81
+
82
+ Details:
83
+ - Batch size: 64
84
+ - Learning rate: 3e-4
85
+ - **Attention Dropout:** attention dropout 0.1
86
+
87
+ ### Fine-Tuning
88
+ Fine-tuning was performed on the SBERT AllNLI.tsv.gz dataset.
89
+
90
+ ### Validation Results
91
+ The model achieved the following test set performance after fine-tuning:
92
+ - **MNLI Unmatched:** 0.9034
93
+ - **SNLI:** 0.9025
94
+
95
+ ---
96
+
97
+ ## Model Card
98
+
99
+ - **Architecture:** ModernBERT-large
100
+ - **Fine-Tuning Data:** `sentence-transformers` - AllNLI.tsv.gz
101
+
102
+ ---
103
+
104
+ ## Thank You
105
+
106
+ Thanks to the AnswerAI team for providing the ModernBERT models, and the Sentence Transformers team for their leadership in transformer encoder models.
107
+ We also thank the tasksource team for their work on zeroshot encoder models.
108
+
109
+ ---
110
+
111
+ ## Citation
112
+
113
+ If you use this model in your research, please cite:
114
+
115
+ ```bibtex
116
+ @misc{moderncenli2025,
117
+ author = {Miller, D. Lee},
118
+ title = {ModernCE NLI: An NLI cross encoder model},
119
+ year = {2025},
120
+ publisher = {Hugging Face Hub},
121
+ url = {https://huggingface.co/dleemiller/ModernCE-large-nli},
122
+ }
123
+ ```
124
+
125
+ ---
126
+
127
+ ## License
128
+
129
+ This model is licensed under the [MIT License](LICENSE).