Add TF weights
#2
by
Rocketknight1
HF staff
- opened
Model converted by the transformers
' pt_to_tf
CLI. All converted model outputs and hidden layers were validated against its PyTorch counterpart.
Maximum crossload output difference=1.993e+00; Maximum crossload hidden layer difference=1.552e-04;
Maximum conversion output difference=1.991e+00; Maximum conversion hidden layer difference=1.552e-04;
CAUTION: The maximum admissible error was manually increased to 2.0!
Quick note on this PR: The huge output difference is caused by the original checkpoint not having any pooler
weights, which get randomly initialized separately in both PT and TF as a result. The actual difference between model outputs other than the pooler is ~1e-4, which is well within acceptable limits.
Rocketknight1
changed pull request status to
merged