davanstrien HF staff commited on
Commit
7745cb1
·
1 Parent(s): 0ade054

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +9 -1
app.py CHANGED
@@ -132,7 +132,15 @@ This demo allows you to play with a 'genre' detection model which has been train
132
  The model was trained with the [fastai](https://docs.fast.ai/) library on training data drawn from [digitised books](https://www.bl.uk/collection-guides/digitised-printed-books) at the British Library. These Books are mainly from the 19th Century.
133
  The demo also shows you which parts of the input the model is using most to make its prediction. You can hover over the words to see the attention score assigned to that word. This gives you some sense of which words are important to the model in making a prediction.
134
 
135
- The examples include titles from the BL books collection.
 
 
 
 
 
 
 
 
136
 
137
  ## Background
138
 
 
132
  The model was trained with the [fastai](https://docs.fast.ai/) library on training data drawn from [digitised books](https://www.bl.uk/collection-guides/digitised-printed-books) at the British Library. These Books are mainly from the 19th Century.
133
  The demo also shows you which parts of the input the model is using most to make its prediction. You can hover over the words to see the attention score assigned to that word. This gives you some sense of which words are important to the model in making a prediction.
134
 
135
+ The examples include titles from the BL books collection. You may notice that the model makes mistakes on short titles in particular, this can partly be explained by the title format in the original data. For example the novel *'Vanity Fair'* by William Makepeace Thackeray
136
+ is found in the training data as:
137
+
138
+ ```
139
+ Vanity Fair. A novel without a hero ... With all the original illustrations by the author, etc
140
+ ```
141
+
142
+ You can see that the model gets a bit of help with the genre here 😉. Since the model was trained for a very particular dataset and task it might not work well on titles that don't match this original corpus.
143
+
144
 
145
  ## Background
146