maclean-connor96 commited on
Commit
4e4a1fd
·
1 Parent(s): 8e792ce

Update ReadMe

Browse files
Files changed (1) hide show
  1. README.md +19 -1
README.md CHANGED
@@ -3,4 +3,22 @@ license: apache-2.0
3
  datasets:
4
  - Abirate/french_book_reviews
5
  pipeline_tag: text-classification
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  datasets:
4
  - Abirate/french_book_reviews
5
  pipeline_tag: text-classification
6
+ ---
7
+ ## Model and approach 🤗
8
+
9
+ #### As I am limited by my personal computer, the training was done on the distilbert-base-multilingual-cased model. This model is 60% faster than the classic BERT model and preserves 95% of the original model's accuracy.
10
+
11
+ #### The dataset provided contains book titles, authors, reviews, and a score for each book. These columns were concatenated to form large context blocks and were used as the input text. The labels, (0, 1, and -1) were normalized to 0, 1, and 2, and finally to NEUTRAL, POSITIVE, and NEGATIVE to help with legibility of the predictions.
12
+
13
+ #### As this exercise is simply to show my capacities to train a model, the model has been trained using 3000 training entries and 300 test entries for 2 epochs.
14
+
15
+ ## Notes on the three classes and the model's bias 📝
16
+
17
+ #### The distribution of these classes is not equal in the ensemble of this dataset. Although it is shuffled, positive reviews are the most present, and therefore most-often predicted category. In addition, the decision to keep the review score in the text block did have an impact on the biases of the model. The model can make a prediction based on score alone, a number between 1 and 5.
18
+
19
+ ### Positive reviews: 2081
20
+ ### Negative reviews: 224
21
+ ### Neutral reviews: 695
22
+
23
+
24
+