--- pipeline_tag: sentence-similarity language: - de tags: - sentence-transformers - sentence-similarity - transformers - setfit license: mit base_model: deepset/gbert-large datasets: - deutsche-telekom/ger-backtrans-paraphrase --- # German BERT large paraphrase euclidean This is a [sentence-transformers](https://www.SBERT.net) model. It maps sentences & paragraphs (text) into a 1024 dimensional dense vector space. The model is intended to be used together with [SetFit](https://github.com/huggingface/setfit) to improve German few-shot text classification. It has a sibling model called [deutsche-telekom/gbert-large-paraphrase-cosine](https://huggingface.co./deutsche-telekom/gbert-large-paraphrase-cosine). This model is based on [deepset/gbert-large](https://huggingface.co./deepset/gbert-large). Many thanks to [deepset](https://www.deepset.ai/)! ## Training **Loss Function**\ We have used [BatchHardSoftMarginTripletLoss](https://www.sbert.net/docs/package_reference/losses.html#batchhardsoftmargintripletloss) with eucledian distance as the loss function: ``` python train_loss = losses.BatchHardSoftMarginTripletLoss( model=model, distance_metric=BatchHardTripletLossDistanceFunction.eucledian_distance, ) ``` **Training Data**\ The model is trained on a carefully filtered dataset of [deutsche-telekom/ger-backtrans-paraphrase](https://huggingface.co./datasets/deutsche-telekom/ger-backtrans-paraphrase). We deleted the following pairs of sentences: - `min_char_len` less than 15 - `jaccard_similarity` greater than 0.3 - `de_token_count` greater than 30 - `en_de_token_count` greater than 30 - `cos_sim` less than 0.85 **Hyperparameters** - learning_rate: 5.5512022294147105e-06 - num_epochs: 7 - train_batch_size: 68 - num_gpu: ??? ## Evaluation Results We use the [NLU Few-shot Benchmark - English and German](https://huggingface.co./datasets/deutsche-telekom/NLU-few-shot-benchmark-en-de) dataset to evaluate this model in a German few-shot scenario. **Qualitative results** - multilingual sentence embeddings provide the worst results - Electra models also deliver poor results - German BERT base size model ([deepset/gbert-base](https://huggingface.co./deepset/gbert-base)) provides good results - German BERT large size model ([deepset/gbert-large](https://huggingface.co./deepset/gbert-large)) provides very good results - our fine-tuned models (this model and [deutsche-telekom/gbert-large-paraphrase-cosine](https://huggingface.co./deutsche-telekom/gbert-large-paraphrase-cosine)) provide best results ## Licensing Copyright (c) 2023 [Philip May](https://may.la/), [Deutsche Telekom AG](https://www.telekom.com/)\ Copyright (c) 2022 [deepset GmbH](https://www.deepset.ai/) Licensed under the **MIT License** (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License by reviewing the file [LICENSE](https://huggingface.co./deutsche-telekom/gbert-large-paraphrase-euclidean/blob/main/LICENSE) in the repository.