---
license: apache-2.0
datasets:
- Abirate/french_book_reviews
pipeline_tag: text-classification
---
## Model and approach 🤗

#### As I am limited by my personal computer, the training was done on the distilbert-base-multilingual-cased model. This model is 60% faster than the classic BERT model and preserves 95% of the original model's accuracy.

#### The dataset provided contains book titles, authors, reviews, and a score for each book. These columns were concatenated to form large context blocks and were used as the input text. The labels, (0, 1, and -1) were normalized to 0, 1, and 2, and finally to NEUTRAL, POSITIVE, and NEGATIVE to help with legibility of the predictions.

#### As this exercise is simply to show my capacities to train a model, the model has been trained using 3000 training entries and 300 test entries for 2 epochs.

## Notes on the three classes and the model's bias 📝

#### The distribution of these classes is not equal in the ensemble of this dataset. Although it is shuffled, positive reviews are the most present, and therefore most-often predicted category. In addition, the decision to keep the review score in the text block did have an impact on the biases of the model. **The model can make a prediction based on score alone, a number between 1 and 5.**

### Positive reviews: 2081
### Negative reviews: 224
### Neutral reviews: 695