Commit
·
4e4a1fd
1
Parent(s):
8e792ce
Update ReadMe
Browse files
README.md
CHANGED
@@ -3,4 +3,22 @@ license: apache-2.0
|
|
3 |
datasets:
|
4 |
- Abirate/french_book_reviews
|
5 |
pipeline_tag: text-classification
|
6 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
datasets:
|
4 |
- Abirate/french_book_reviews
|
5 |
pipeline_tag: text-classification
|
6 |
+
---
|
7 |
+
## Model and approach 🤗
|
8 |
+
|
9 |
+
#### As I am limited by my personal computer, the training was done on the distilbert-base-multilingual-cased model. This model is 60% faster than the classic BERT model and preserves 95% of the original model's accuracy.
|
10 |
+
|
11 |
+
#### The dataset provided contains book titles, authors, reviews, and a score for each book. These columns were concatenated to form large context blocks and were used as the input text. The labels, (0, 1, and -1) were normalized to 0, 1, and 2, and finally to NEUTRAL, POSITIVE, and NEGATIVE to help with legibility of the predictions.
|
12 |
+
|
13 |
+
#### As this exercise is simply to show my capacities to train a model, the model has been trained using 3000 training entries and 300 test entries for 2 epochs.
|
14 |
+
|
15 |
+
## Notes on the three classes and the model's bias 📝
|
16 |
+
|
17 |
+
#### The distribution of these classes is not equal in the ensemble of this dataset. Although it is shuffled, positive reviews are the most present, and therefore most-often predicted category. In addition, the decision to keep the review score in the text block did have an impact on the biases of the model. The model can make a prediction based on score alone, a number between 1 and 5.
|
18 |
+
|
19 |
+
### Positive reviews: 2081
|
20 |
+
### Negative reviews: 224
|
21 |
+
### Neutral reviews: 695
|
22 |
+
|
23 |
+
|
24 |
+
|