Adam Optimizer with a constant learning rate 5e-6 for 4000 steps training (batch_size=128). Only the vision encoder is fine-tuned.
Test set accuracy:
Base model