metadata
license: cc-by-4.0
language:
- en
pipeline_tag: text-classification
tags:
- glam
- lam
- subject indexing
- annif
- hogwarts
Hogwarts Sorting Hat using Annif and its fastText backend
The model is the output of this Annif tutorial exercise.
The original Sorting Hat reads the thoughts of the student, but Annif generally does not have access to that kind of information, so we will simply use the name of the student as input. We will train a fastText model on the names of characters from the Harry Potter novels whose house is known. To make it possible to generalize the model to new, unseen names, we will use character n-grams to split all names into chunks of 1 to 4 characters - for example harry becomes [h, ha, har, harr, a, ar, arr, arry ...]. fastText can do this when given the minn and maxn parameters, which set the minimum and maximum length of character n-grams to generate from input text.