metadata

license: apache-2.0
datasets:
  - timm/mini-imagenet

Comparisons of timm Optimizers w/ Caution

This repo contains summaries of several sets of experiments comparing a number of optimizers with and without caution (https://huggingface.co./papers/2411.16085) enabled.

The runs were all performed training a smaller ViT (vit_wee_patch16_reg1_gap_256) for 200 epochs (10M samples seen) from scratch on the timm 'mini-imagenet' dataset, a 100 class subset of imagenet with same image sizes as originals.

So far I have results for adamw and laprop but have some mars on the way. You can find full results in sub-folders by optimizer names. In all of these runs, the experiments with 'c' prefix in the name have caution enabled.

LaProp

optim	best_epoch	train_loss	eval_loss	eval_top1	eval_top5	lr
claprop, lr=1e-03	204.0	2.2173619270324707	1.0931779468536378	73.920000390625	91.33000009765624	0.0
claprop, lr=5e-04	183.0	2.262192726135254	1.0912627222061158	73.77000073242188	91.22000260009766	1.3478660293113704e-05
laprop, lr=5e-04	198.0	2.2425642013549805	1.1426102781295775	71.73000213623047	90.55000146484376	1.109508849230001e-06
laprop, lr=1e-03	179.0	2.290040969848633	1.168387135314941	71.15000104980469	90.18000189208983	3.806023374435663e-05
claprop, lr=2e-04	195.0	2.546172380447388	1.2475446645736694	68.30000163574219	89.15000153808593	9.97634228344235e-07
laprop, lr=2e-04	204.0	2.6702351570129395	1.309178423690796	67.07999990234374	88.67000270996094	0.0
claprop, lr=2e-03	193.0	2.678058862686157	1.5239886917114258	62.08000177001953	84.8	1.4890673845226132e-05
laprop, lr=2e-03	200.0	2.70467209815979	1.522907255935669	61.46000135498047	85.28000162353516	1.9732715717284413e-06

LaProp Top-1 Evaluation Accuracy on Mini-ImageNet

LaProp Train Loss

AdamW

optim	best_epoch	train_loss	eval_loss	eval_top1	eval_top5
cadamw, lr=1e-03	184.0	2.2688851356506348	1.0868136840820313	73.52000141601563	91.60000036621092
cadamw, lr=5e-04	199.0	2.163278102874756	1.0976034646987916	73.3900005859375	91.31000137939454
cadamw, lr=1e-03, clip grads	203.0	2.1360626220703125	1.1043113907814026	73.33000158691407	91.41000042724608
adamw, lr=1e-03, clip grads	195.0	2.2746386528015137	1.142998440361023	72.11000151367188	90.47000052490236
adamw, lr=5e-04	185.0	2.3040246963500977	1.1535791856765747	71.50000120849609	90.4800001953125
adamw, lr=1e-03	199.0	2.223684310913086	1.1657958560943604	71.22999993896484	90.30999958496092
cadamw, lr=2e-04	189.0	2.538627862930298	1.2325929063796996	68.94999995117188	89.61000139160156
adamw, lr=2e-04	203.0	2.579624652862549	1.3085522148132325	67.11000026855469	88.66000164794922

rwightman
/

timm-optim-caution

Comparisons of timm Optimizers w/ Caution

LaProp

LaProp Top-1 Evaluation Accuracy on Mini-ImageNet

LaProp Train Loss

AdamW

AdamW Top-1 Evaluation Accuracy on Mini-ImageNet

AdamW Train Loss