Add TF weights
Model converted by the transformers
' pt_to_tf
CLI. All converted model outputs and hidden layers were validated against its Pytorch counterpart.
Maximum crossload output difference=7.952e-04; Maximum crossload hidden layer difference=3.315e-02;
Maximum conversion output difference=7.952e-04; Maximum conversion hidden layer difference=3.315e-02;
CAUTION: The maximum admissible error was manually increased to 0.05!
past_key_values[0][0]: 1.431e-06
past_key_values[0][1]: 2.831e-07
past_key_values[0][2]: 1.173e-03
past_key_values[0][3]: 4.318e-04
past_key_values[1][0]: 2.965e-06
past_key_values[1][1]: 1.431e-06
past_key_values[1][2]: 5.204e-04
past_key_values[1][3]: 3.059e-04
past_key_values[2][0]: 1.335e-05
past_key_values[2][1]: 1.192e-06
past_key_values[2][2]: 6.023e-04
past_key_values[2][3]: 4.654e-04
past_key_values[3][0]: 1.001e-05
past_key_values[3][1]: 5.364e-07
past_key_values[3][2]: 8.546e-04
past_key_values[3][3]: 2.506e-04
past_key_values[4][0]: 2.384e-06
past_key_values[4][1]: 6.743e-07
past_key_values[4][2]: 9.109e-04
past_key_values[4][3]: 2.525e-04
past_key_values[5][0]: 3.457e-06
past_key_values[5][1]: 1.192e-06
past_key_values[5][2]: 7.614e-04
past_key_values[5][3]: 3.863e-04
List of maximum hidden layer differences above the threshold (1e-10):
last_hidden_state: 1.221e-04
decoder_hidden_states[1]: 5.484e-06
decoder_hidden_states[2]: 3.815e-06
decoder_hidden_states[3]: 2.670e-05
decoder_hidden_states[4]: 3.052e-05
decoder_hidden_states[5]: 2.670e-05
decoder_hidden_states[6]: 1.221e-04
encoder_last_hidden_state: 2.536e-03
encoder_hidden_states[0]: 1.717e-05
encoder_hidden_states[1]: 2.384e-05
encoder_hidden_states[2]: 2.575e-05
encoder_hidden_states[3]: 2.575e-05
encoder_hidden_states[4]: 6.485e-05
encoder_hidden_states[5]: 4.909e-02
encoder_hidden_states[6]: 2.536e-03