--- language: da tags: - bert - punctuation restoration license: apache-2.0 datasets: - custom --- # Bert Punctuation Restoration Danish This model performs the punctuation restoration task in Danish. The method used is sequence classification similar to the NER models are trained. ## Model description Amazing description of a model that does stuff ## Intended uses & limitations Use it through custom library do to extra inference code. ### How to use You can use this model directly with a pipeline for masked language modeling: ```python from punctfix import Autopunct model = Autopunct(language="da") example_text = "hej med dig mit navn det er rasmus og det er mig som har trænet denne lækre model" print(model.punctuate(example_test)) ``` ### Limitations and bias ## Training data To Do ## Training procedure ### Preprocessing TODO The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are then of the form: ``` [CLS] Sentence A [SEP] Sentence B [SEP] ``` With probability 0.5, sentence A and sentence B correspond to two consecutive sentences in the original corpus and in the other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a consecutive span of text usually longer than a single sentence. The only constrain is that the result with the two "sentences" has a combined length of less than 512 tokens. The details of the masking procedure for each sentence are the following: - 15% of the tokens are masked. - In 80% of the cases, the masked tokens are replaced by `[MASK]`. - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace. - In the 10% remaining cases, the masked tokens are left as is. ## Evaluation results TODO When fine-tuned on downstream tasks, this model achieves the following results: Results: | Task | MNLI | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | |:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:| | | 82.2 | 88.5 | 89.2 | 91.3 | 51.3 | 85.8 | 87.5 | 59.9 |