Knesset-multi-e5-large

This is a sentence-transformers model. It maps sentences and paragraphs to a 1024-dimensional dense vector space and can be used for tasks like clustering or semantic search.

Knesset-multi-e5-large is based on the intfloat/multilingual-e5-large model. The transformer encoder has been fine-tuned on Knesset data to better capture legislative and parliamentary language.

Usage (Sentence-Transformers)

Using this model is straightforward if you have sentence-transformers installed:

pip install -U sentence-transformers

Then you can use the model like this:

from sentence_transformers import SentenceTransformer
sentences = ["ื–ื” ืžืฉืคื˜ ืจืืฉื•ืŸ ืœื“ื•ื’ืžื”", "ื–ื” ื”ืžืฉืคื˜ ื”ืฉื ื™"]

model = SentenceTransformer('GiliGold/Knesset-multi-e5-large')
embeddings = model.encode(sentences)
print(embeddings)

Evaluation Results

For an automated evaluation of this model, see the Sentence Embeddings Benchmark: https://seb.sbert.net

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Normalize()
)

Additional Details

Base Model: intfloat/multilingual-e5-large Fine-Tuning Data: Knesset data Key Modifications: The encoder part has been fine-tuned on Knesset data to enhance performance for tasks involving legislative and parliamentary content. The original pooling and normalization layers have been retained to ensure that the model's embeddings remain consistent with the architecture of the base model.

Citing & Authors

TBD

Downloads last month
0
Safetensors
Model size
560M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for GiliGold/Knesset-multi-e5-large

Finetuned
(74)
this model

Dataset used to train GiliGold/Knesset-multi-e5-large