jannikskytt commited on
Commit
40c9ee1
1 Parent(s): 5e61be3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -3
README.md CHANGED
@@ -1,3 +1,36 @@
1
- ---
2
- license: cc-by-nc-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-3.0
3
+ language:
4
+ - da
5
+ tags:
6
+ - word embeddings
7
+ - Danish
8
+ ---
9
+ # Danish medical word embeddings
10
+
11
+ MeDa-We was trained on a Danish medical corpus of 123M tokens. The word embeddings are 300-dimensional and are trained using [FastText](https://fasttext.cc/).
12
+
13
+ The embeddings were trained for 10 epochs using a window size of 5 and 10 negative samples.
14
+
15
+ The development of the corpus and word embeddings is described further in our [paper](https://aclanthology.org/2023.nodalida-1.31/).
16
+
17
+ We also trained a transformer model on the developed corpus which can be found [here](https://huggingface.co/jannikskytt/MeDa-Bert).
18
+
19
+ ### Citing
20
+
21
+ ```
22
+ @inproceedings{pedersen-etal-2023-meda,
23
+ title = "{M}e{D}a-{BERT}: A medical {D}anish pretrained transformer model",
24
+ author = "Pedersen, Jannik and
25
+ Laursen, Martin and
26
+ Vinholt, Pernille and
27
+ Savarimuthu, Thiusius Rajeeth",
28
+ booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
29
+ month = may,
30
+ year = "2023",
31
+ address = "T{\'o}rshavn, Faroe Islands",
32
+ publisher = "University of Tartu Library",
33
+ url = "https://aclanthology.org/2023.nodalida-1.31",
34
+ pages = "301--307",
35
+ }
36
+ ```