|
--- |
|
license: cc-by-sa-4.0 |
|
library_name: pytorch |
|
language: |
|
- ru |
|
- vep |
|
datasets: |
|
- Lynxpda/back-translated-veps-russian |
|
pipeline_tag: translation |
|
--- |
|
|
|
# Model Card for Veps - Russian version 1.0 |
|
|
|
A model of translation from Vepsian into Russian. |
|
In archive initial weights of the model trained with OpenNMT-py (Locomotive). |
|
The model has 457M parameters and is trained from scratch. |
|
Also presented are model weights converted for Ctranslate2 and a package for installation and use with Argostranslate/Libretranslate. |
|
|
|
## Model Architecture and Objective |
|
|
|
``` |
|
dec_layers: 20 |
|
decoder_type: transformer |
|
enc_layers: 20 |
|
encoder_type: transformer |
|
heads: 8 |
|
hidden_size: 512 |
|
max_relative_positions: 20 |
|
model_dtype: fp16 |
|
pos_ffn_activation_fn: gated-gelu |
|
position_encoding: false |
|
share_decoder_embeddings: true |
|
share_embeddings: true |
|
share_vocab: true |
|
src_vocab_size: 32000 |
|
tgt_vocab_size: 32000 |
|
transformer_ff: 6144 |
|
word_vec_size: 512 |
|
``` |
|
# How to Use |
|
|
|
## Using the Model with OpenNMT-py |
|
|
|
To fine-tune the Vepsian to Russian translation model using [OpenNMT-py](https://github.com/OpenNMT/OpenNMT-py), you can modify and use the example configuration file from this repository - config.yml. |
|
|
|
|
|
## Using the Model with LibreTranslate and Argos Translate |
|
|
|
To use the Vepsian to Russian translation model with [LibreTranslate](https://github.com/LibreTranslate/LibreTranslate) and [Argos Translate](https://github.com/argosopentech/argos-translate), follow these steps: |
|
|
|
* Download the Model Archive: Ensure you have the `translate-vep_ru-1_0.argosmodel` file. |
|
* Locate the Packages Folder: |
|
* On Linux/MacOS: `~/.local/share/argos-translate/packages` |
|
* On Windows: `%userprofile%\.local\share\argos-translate\packages` |
|
* Create the Language Pair Folder: |
|
* Create a folder named vep_ru in the packages directory. If it already exists, delete or move it. |
|
* Extract the Model Archive: |
|
* Change the extension of the .argosmodel file to .zip. |
|
* Extract the contents of the .zip file into the vep_ru folder. |
|
* Restart LibreTranslate: |
|
* Restart the LibreTranslate application to load the new model. |
|
|
|
|
|
# Citing & Authors |
|
|
|
``` |
|
@inproceedings{ |
|
title={Model for Veps - Russian translation.}, |
|
author={Maksim Migukin, Maksim Kuznetsov, Alexey Kutashov}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
## Credits |
|
|
|
Data compiled by [Opus](https://opus.nlpl.eu/). |
|
|
|
Includes pretrained models from [Stanza](https://github.com/stanfordnlp/stanza/). |
|
|
|
Data from Vepsian [WiKi](https://vep.wikipedia.org/wiki/) |
|
|
|
Data from [Lehme No 2051 // Open corpus of Vepsian and Karelian languages VepKar.](http://dictorpus.krc.karelia.ru/) |
|
|
|
Data from [OMAMEDIA](https://omamedia.ru/) |
|
|
|
CCMatrix |
|
|
|
http://opus.nlpl.eu/CCMatrix-v1.php |
|
|
|
If you use the dataset or code, please cite (pdf) and, please, acknowledge OPUS (bib, pdf) as well for this release. |
|
|
|
This corpus has been extracted from web crawls using the margin-based bitext mining techniques described here. The original distribution is available from http://data.statmt.org/cc-matrix/ |
|
|
|
OpenSubtitles |
|
|
|
http://opus.nlpl.eu/OpenSubtitles-v2018.php |
|
|
|
Please cite the following article if you use any part of the corpus in your own work: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) |