|
--- |
|
library_name: tokenizers |
|
tags: [Danish, Morphological Tokenization, CerebrasGPT] |
|
--- |
|
``` |
|
_______ ___ .___ ___. ______ .______ .______ __ __ |
|
| \ / \ | \/ | / __ \ | _ \ | _ \ | | | | |
|
| .--. | / ^ \ | \ / | | | | | | |_) | | |_) | | |__| | |
|
| | | | / /_\ \ | |\/| | | | | | | / | ___/ | __ | |
|
| '--' | / _____ \ | | | | | `--' | | |\ \----.| | | | | | |
|
|_______/ /__/ \__\ |__| |__| \______/ | _| `._____|| _| |__| |__| |
|
|
|
``` |
|
### DA-MORPH-CEREBRAS-TOKEN |
|
|
|
This morphological tokenizer is designed for the CerebrasGPT architecture and focuses on segmenting Danish text based on linguistic principles, enabling more meaningful subword tokenization. |