Alvenir
/

bert-punct-restoration-da

@@ -10,62 +10,39 @@ datasets:
 ---
 # Bert Punctuation Restoration Danish
-This model performs the punctuation restoration task in Danish. The method used is sequence classification similar to the NER models
 are trained.
 ## Model description
-Amazing description of a model that does stuff
-## Intended uses & limitations
-Use it through custom library do to extra inference code.
 ### How to use
-You can use this model directly with a pipeline for masked language modeling:
 ```python
-from punctfix import Autopunct
-model = Autopunct(language="da")
-example_text = "hej med dig mit navn det er rasmus og det er mig som har trænet denne lækre model"
-print(model.punctuate(example_test))
 ```
-### Limitations and bias
 ## Training data
-To Do
 ## Training procedure
 ### Preprocessing
 TODO
-The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are
-then of the form:
-```
-[CLS] Sentence A [SEP] Sentence B [SEP]
-```
-With probability 0.5, sentence A and sentence B correspond to two consecutive sentences in the original corpus and in
-the other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a
-consecutive span of text usually longer than a single sentence. The only constrain is that the result with the two
-"sentences" has a combined length of less than 512 tokens.
-The details of the masking procedure for each sentence are the following:
-- 15% of the tokens are masked.
-- In 80% of the cases, the masked tokens are replaced by `[MASK]`.
-- In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.
-- In the 10% remaining cases, the masked tokens are left as is.
 ## Evaluation results
 TODO
-When fine-tuned on downstream tasks, this model achieves the following results:
-Results:
-| Task | MNLI | QQP  | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE  |
-|:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|
-|      | 82.2 | 88.5 | 89.2 | 91.3  | 51.3 | 85.8  | 87.5 | 59.9 |

 ---
 # Bert Punctuation Restoration Danish
+This model performs the punctuation restoration task in Danish. The method used is sequence classification similar to how NER models
 are trained.
 ## Model description
+TODO
 ### How to use
+The model requires some additional inference code, hence we created an awesome little pip package for inference.
+The inference code is based on the `TokenClassificationPipeline` pipeline from huggingface
 ```python
+>>> from punctfix import PunctFixer
+>>> model = PunctFixer(language="da")
+>>> example_text = "mit navn det er rasmus og jeg kommer fra firmaet alvenir det er mig som har trænet denne lækre model"
+>>> print(model.punctuate(example_test))
+Mit navn det er Rasmus og jeg kommer fra firmaet Alvenir. Det er mig som har trænet denne lækre model.
+>>> example_text = "mit navn det er rasmus og jeg kommer fra firmaet alvenir det er mig som har trænet denne lækre model"
+>>> print(fixer.punctuate(example_text))
+En dag bliver vi sku glade for, at vi nu kan sætte punktummer og kommaer i en sætning. Det fungerer da meget godt, ikke?
 ```
 ## Training data
+To Do
 ## Training procedure
+To Do
 ### Preprocessing
 TODO
 ## Evaluation results
 TODO