DaMorphTokenizers
Collection
DaMorph is a collection of experimental morphological tokenizers developed to explore the impact of morphological segmentation on Danish NLP.
•
6 items
•
Updated
_______ ___ .___ ___. ______ .______ .______ __ __
| \ / \ | \/ | / __ \ | _ \ | _ \ | | | |
| .--. | / ^ \ | \ / | | | | | | |_) | | |_) | | |__| |
| | | | / /_\ \ | |\/| | | | | | | / | ___/ | __ |
| '--' | / _____ \ | | | | | `--' | | |\ \----.| | | | | |
|_______/ /__/ \__\ |__| |__| \______/ | _| `._____|| _| |__| |__|
This morphological tokenizer is designed for the CerebrasGPT architecture and focuses on segmenting Danish text based on linguistic principles, enabling more meaningful subword tokenization.