File size: 2,957 Bytes
afc71eb
49a23c7
 
afc71eb
 
 
baef4d3
 
 
 
49a23c7
 
 
 
afc71eb
 
 
 
 
 
 
baef4d3
afc71eb
c3e61d4
afc71eb
 
 
baef4d3
afc71eb
 
 
baef4d3
afc71eb
 
baef4d3
 
afc71eb
baef4d3
afc71eb
 
 
 
 
 
 
 
 
 
 
 
c3e61d4
afc71eb
 
 
 
 
c3e61d4
 
 
 
 
 
 
 
 
 
afc71eb
 
 
 
1725e9a
afc71eb
baef4d3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
language:
- de
license: apache-2.0
tags:
- generated_from_trainer
datasets:
- amazon_reviews_multi
metrics:
- perplexity
base_model: distilbert-base-german-cased
model-index:
- name: distilbert-base-german-cased-finetuned-amazon-reviews
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# distilbert-base-german-cased-finetuned-amazon-reviews

This model is a fine-tuned version of [distilbert-base-german-cased](https://huggingface.co./distilbert-base-german-cased) on the amazon_reviews_multi dataset.
It achieves the following results on the evaluation set:
- Loss: 3.8874

## Model description

This model is a fine-tuned version of distilbert-base-german-cased using the dataset from amazon_reviews_multi (available in Huggin Face). The purpose is to extend the model's domain, which, once fine-tuned, will be modified for the fill-in-the-gaps task. It's related to my other model (fine-tuned-spanish-bert) as a comparison of both performances.

## Intended uses & limitations

The use is limited to school use and the limitations have to do with the size of the dataset, since it does not allow for a large contribution, a larger dataset would have to be used to get a larger contribution.

## Training and evaluation data
I did a training that gives the training and validation set loss. (It takes a lot of time. If you're using colab, I recommend to use less Epochs because the result does not change too much, and even though the loss is quite high, the performance of the model based on the perplexity is not that bad) 
Also, I checked the perplexity, which is one good measure for Languages Models. The value of the perplexity is considerabily good: 48'78.

- Evaluation: I checked the performance of my model in the notebook provided, just by generating examples.  

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 4.625         | 1.0   | 141  | 4.2747          |
| 4.3013        | 2.0   | 282  | 4.1549          |
| 4.1841        | 3.0   | 423  | 4.0902          |
| 4.1208        | 4.0   | 564  | 3.9958          |
| 4.0475        | 5.0   | 705  | 3.9710          |
| 4.0116        | 6.0   | 846  | 3.9100          |
| 3.9988        | 7.0   | 987  | 3.9194          |
| 3.9641        | 8.0   | 1128 | 3.9381          |
| 3.9661        | 9.0   | 1269 | 3.8631          |
| 3.944         | 10.0  | 1410 | 3.8668          |


### Framework versions

- Transformers 4.27.0
- Pytorch 1.13.1+cu116
- Tokenizers 0.13.2