Dr. Jorge Abreu Vicente
commited on
Commit
·
561cc87
1
Parent(s):
eb7f17a
Update README.md
Browse files
README.md
CHANGED
@@ -28,6 +28,44 @@ However, the file [`convert_megatron_bert_checkpoint.py`](https://github.com/hug
|
|
28 |
|
29 |
We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-uncased-345m).
|
32 |
|
33 |
```python
|
|
|
28 |
|
29 |
We provide in the repository an alternative version of the python script in order to any user to cross-check the validity of the model replicated in this repository.
|
30 |
|
31 |
+
|
32 |
+
The code below is a modification of the original [`convert_megatron_bert_checkpoint.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py).
|
33 |
+
|
34 |
+
```python
|
35 |
+
import os
|
36 |
+
import torch
|
37 |
+
from convert_biomegatron_checkpoint import convert_megatron_checkpoint
|
38 |
+
|
39 |
+
print_checkpoint_structure = True
|
40 |
+
path_to_checkpoint = "/path/to/BioMegatron345mUncased/"
|
41 |
+
|
42 |
+
# Extract the basename.
|
43 |
+
basename = os.path.dirname(path_to_checkpoint).split('/')[-1]
|
44 |
+
|
45 |
+
# Load the model.
|
46 |
+
input_state_dict = torch.load(os.path.join(path_to_checkpoint, 'model_optim_rng.pt'), map_location="cpu")
|
47 |
+
|
48 |
+
# Convert.
|
49 |
+
print("Converting")
|
50 |
+
output_state_dict, output_config = convert_megatron_checkpoint(input_state_dict, head_model=False)
|
51 |
+
|
52 |
+
# Print the structure of converted state dict.
|
53 |
+
if print_checkpoint_structure:
|
54 |
+
recursive_print(None, output_state_dict)
|
55 |
+
|
56 |
+
# Store the config to file.
|
57 |
+
output_config_file = os.path.join(path_to_checkpoint, "config.json")
|
58 |
+
print(f'Saving config to "{output_config_file}"')
|
59 |
+
with open(output_config_file, "w") as f:
|
60 |
+
json.dump(output_config, f)
|
61 |
+
|
62 |
+
# Store the state_dict to file.
|
63 |
+
output_checkpoint_file = os.path.join(path_to_checkpoint, "pytorch_model.bin")
|
64 |
+
print(f'Saving checkpoint to "{output_checkpoint_file}"')
|
65 |
+
torch.save(output_state_dict, output_checkpoint_file)
|
66 |
+
|
67 |
+
```
|
68 |
+
|
69 |
BioMegatron can be run with the standard 🤗 script for loading models. Here we show an example identical to that of [`nvidia/megatron-bert-uncased-345m`](https://huggingface.co/nvidia/megatron-bert-uncased-345m).
|
70 |
|
71 |
```python
|