language: en | |
license: mit | |
library_name: pytorch | |
# Plainly Optimized Network | |
Dataset: BIGBENCH | |
Trainer Hyperparameters: | |
- `lr` = 5e-05 | |
- `per_device_batch_size` = 8 | |
- `gradient_accumulation_steps` = 2 | |
- `weight_decay` = 0.0 | |
- `seed` = 42 | |
|eval_loss|eval_accuracy|epoch| | |
|--|--|--| | |
|10.410|0.571|1.0| | |
|10.191|0.571|2.0| | |
|9.468|0.643|3.0| | |
|10.414|0.571|4.0| | |
|10.468|0.571|5.0| | |
|10.335|0.571|6.0| | |
|10.296|0.571|7.0| | |
|9.998|0.571|8.0| | |
|10.080|0.571|9.0| | |
|10.186|0.571|10.0| | |
|9.862|0.571|11.0| | |
|10.713|0.500|12.0| | |
|9.873|0.571|13.0| | |
|9.905|0.571|14.0| | |
|9.860|0.571|15.0| | |
|9.997|0.571|16.0| | |
|9.823|0.571|17.0| | |
|9.840|0.571|18.0| | |
|9.817|0.571|19.0| | |