Reproducing idefics-8b(instruct)
#61
by
Iheb-Chaabane
- opened
I’m trying to reproduce the instruct version starting from the base ( pretrained) checkpoint.
Can you please provide more details on the proportion of the datasets in cauldron and training hyper parameters (lr, weight decay, nbr epochs…)?
Thanks,
Most of this is detailed in the paper in appendix