cyrusyc commited on
Commit
57ca36b
·
1 Parent(s): 08a0aef

add example training script

Browse files
Files changed (2) hide show
  1. 2023-08-14-mace-universal.sbatch +59 -0
  2. README.md +8 -2
2023-08-14-mace-universal.sbatch ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #SBATCH -C gpu
3
+ #SBATCH -G 40
4
+ #SBATCH -N 10
5
+ #SBATCH --ntasks=40
6
+ #SBATCH --ntasks-per-node=4
7
+ #SBATCH --cpus-per-task=4
8
+ #SBATCH --time=6:00:00
9
+ #SBATCH --time-min=02:00:00
10
+ #SBATCH --error=%x-%j.err
11
+ #SBATCH --output=%x-%j.out
12
+ #SBATCH --requeue
13
+ #SBATCH --exclusive
14
+ #SBATCH --open-mode=append
15
+
16
+ exp_name=$(basename "$SLURM_SUBMIT_DIR")
17
+
18
+ srun python run_train.py \
19
+ --name=$exp_name \
20
+ --train_file="train.h5" \
21
+ --valid_file="valid.h5" \
22
+ --statistics_file="statistics.json" \
23
+ --energy_weight=1 \
24
+ --forces_weight=1 \
25
+ --eval_interval=1 \
26
+ --config_type_weights='{"Default":1.0}' \
27
+ --E0s='average' \
28
+ --error_table='PerAtomMAE' \
29
+ --stress_key='stress' \
30
+ --model="ScaleShiftMACE" \
31
+ --MLP_irreps="64x0e" \
32
+ --interaction_first="RealAgnosticResidualInteractionBlock" \
33
+ --interaction="RealAgnosticResidualInteractionBlock" \
34
+ --num_interactions=2 \
35
+ --num_channels=128 \
36
+ --max_ell=3 \
37
+ --hidden_irreps='64x0e + 64x1o + 64x2e' \
38
+ --num_cutoff_basis=10 \
39
+ --lr=1e-2 \
40
+ --correlation=3 \
41
+ --r_max=6.0 \
42
+ --num_radial_basis=10 \
43
+ --scaling='rms_forces_scaling' \
44
+ --distributed \
45
+ --num_workers=4 \
46
+ --batch_size=10 \
47
+ --valid_batch_size=30 \
48
+ --max_num_epochs=500 \
49
+ --patience=250 \
50
+ --amsgrad \
51
+ --weight_decay=1e-8 \
52
+ --ema \
53
+ --ema_decay=0.999 \
54
+ --default_dtype="float32"\
55
+ --clip_grad=100 \
56
+ --device=cuda \
57
+ --seed=3 \
58
+ --save_cpu \
59
+ --restart_latest &
README.md CHANGED
@@ -79,11 +79,17 @@ If you use the pretrained models in this repository, please cite all the followi
79
  }
80
  ```
81
 
82
- # Training Details
83
 
84
  ## Training Data
85
 
86
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
87
 
 
 
 
 
 
 
 
88
 
89
- ## Training Procedure
 
79
  }
80
  ```
81
 
82
+ # Training Guide
83
 
84
  ## Training Data
85
 
86
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
87
 
88
+ For now, please download MPTrj data from [figshare](https://figshare.com/articles/dataset/Materials_Project_Trjectory_MPtrj_Dataset/23713842). We may upload to HuggingFace Datasets in the future.
89
+
90
+ ## Fine-tuning
91
+
92
+ <!-- This should link to a Training Procedure Card, perhaps with a short stub of information on what the training procedure is all about as well as documentation related to hyperparameters or additional training details. -->
93
+
94
+ We provide an example multi-GPU training script [2023-08-14-mace-universal.sbatch]([2023-08-14-mace-universal.model](https://huggingface.co/cyrusyc/mace-universal/blob/main/2023-08-14-mace-universal.sbatch)), which uses 40 A100s on NERSC Perlmutter. Please see MACE `multi-gpu` branch for more detailed instructions.
95