language: en | |
tags: | |
- bert | |
- regression | |
- biencoder | |
- similarity | |
pipeline_tag: text-similarity | |
# BiEncoder Regression Model | |
This model is a BiEncoder architecture that outputs similarity scores between text pairs. | |
## Model Details | |
- Base Model: bert-base-uncased | |
- Task: Regression | |
- Architecture: BiEncoder with cosine similarity | |
- Loss Function: mae | |
## Usage | |
```python | |
from transformers import AutoTokenizer, AutoModel | |
from modeling import BiEncoderModelRegression | |
# Load model components | |
tokenizer = AutoTokenizer.from_pretrained("minoosh/bert-reg-biencoder-mae") | |
base_model = AutoModel.from_pretrained("bert-base-uncased") | |
model = BiEncoderModelRegression(base_model, loss_fn="mae") | |
# Load weights | |
state_dict = torch.load("pytorch_model.bin") | |
model.load_state_dict(state_dict) | |
# Prepare inputs | |
texts1 = ["first text"] | |
texts2 = ["second text"] | |
inputs = tokenizer( | |
texts1, texts2, | |
padding=True, | |
truncation=True, | |
return_tensors="pt" | |
) | |
# Get similarity scores | |
outputs = model(**inputs) | |
similarity_scores = outputs["logits"] | |
``` | |
## Metrics | |
The model was trained using mae loss and evaluated using: | |
- Mean Squared Error (MSE) | |
- Mean Absolute Error (MAE) | |
- Pearson Correlation | |
- Spearman Correlation | |
- Cosine Similarity | |