sagawa commited on
Commit
7bcfe52
1 Parent(s): 1cb4b4f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -8
README.md CHANGED
@@ -52,12 +52,13 @@ output # 'CCN(CC)CCN=C=S.Cc1cnc2c(c1)CCCC2N'
52
  ### Training Procedure
53
 
54
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
55
- We used the Open Reaction Database (ORD) dataset for model training.
56
- The command used for training is the following. For more information, please refer to the paper and GitHub repository.
57
 
58
  ```python
59
- python train_without_duplicates.py \
60
- --model='t5' \
 
61
  --epochs=80 \
62
  --lr=2e-4 \
63
  --batch_size=32 \
@@ -67,10 +68,10 @@ python train_without_duplicates.py \
67
  --evaluation_strategy='epoch' \
68
  --save_strategy='epoch' \
69
  --logging_strategy='epoch' \
70
- --train_data_path='/home/acf15718oa/ReactionT5_neword/data/all_ord_reaction_uniq_with_attr20240506_v3_train.csv' \
71
- --valid_data_path='/home/acf15718oa/ReactionT5_neword/data/all_ord_reaction_uniq_with_attr20240506_v3_valid.csv' \
72
- --test_data_path='/home/acf15718oa/ReactionT5_neword/data/all_ord_reaction_uniq_with_attr20240506_v3_test.csv' \
73
- --USPTO_test_data_path='/home/acf15718oa/ReactionT5_neword/data/USPTO_50k/test.csv' \
74
  --pretrained_model_name_or_path='sagawa/CompoundT5'
75
  ```
76
 
 
52
  ### Training Procedure
53
 
54
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
55
+ We used the [Open Reaction Database (ORD) dataset](https://drive.google.com/file/d/1fa2MyLdN1vcA7Rysk8kLQENE92YejS9B/view?usp=drive_link) for model training. In addition, we used [USPTO_50k dataset](https://yzhang.hpc.nyu.edu/T5Chem/index.html)'s test split to prevent data leakage.
56
+ The command used for training is the following. For more information about data preprocessing and training, please refer to the paper and GitHub repository.
57
 
58
  ```python
59
+ cd task_retrosynthesis
60
+ python train.py \
61
+ --output_dir='t5' \
62
  --epochs=80 \
63
  --lr=2e-4 \
64
  --batch_size=32 \
 
68
  --evaluation_strategy='epoch' \
69
  --save_strategy='epoch' \
70
  --logging_strategy='epoch' \
71
+ --train_data_path='../data/preprocessed_ord_train.csv' \
72
+ --valid_data_path='../data/preprocessed_ord_valid.csv' \
73
+ --test_data_path='../data/preprocessed_ord_test.csv' \
74
+ --USPTO_test_data_path='../data/USPTO_50k/test.csv' \
75
  --pretrained_model_name_or_path='sagawa/CompoundT5'
76
  ```
77