sagawa
/

ReactionT5v2-retrosynthesis

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

sagawa commited on 12 days ago

Commit

7bcfe52

•

1 Parent(s): 1cb4b4f

Update README.md

Files changed (1) hide show

README.md +9 -8

README.md CHANGED Viewed

@@ -52,12 +52,13 @@ output # 'CCN(CC)CCN=C=S.Cc1cnc2c(c1)CCCC2N'
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-We used the Open Reaction Database (ORD) dataset for model training.
-The command used for training is the following. For more information, please refer to the paper and GitHub repository.
 ```python
-python train_without_duplicates.py \
-    --model='t5' \
     --epochs=80 \
     --lr=2e-4 \
     --batch_size=32 \
@@ -67,10 +68,10 @@ python train_without_duplicates.py \
     --evaluation_strategy='epoch' \
     --save_strategy='epoch' \
     --logging_strategy='epoch' \
-    --train_data_path='/home/acf15718oa/ReactionT5_neword/data/all_ord_reaction_uniq_with_attr20240506_v3_train.csv' \
-    --valid_data_path='/home/acf15718oa/ReactionT5_neword/data/all_ord_reaction_uniq_with_attr20240506_v3_valid.csv' \
-    --test_data_path='/home/acf15718oa/ReactionT5_neword/data/all_ord_reaction_uniq_with_attr20240506_v3_test.csv' \
-    --USPTO_test_data_path='/home/acf15718oa/ReactionT5_neword/data/USPTO_50k/test.csv' \
     --pretrained_model_name_or_path='sagawa/CompoundT5'
 ```

 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+We used the [Open Reaction Database (ORD) dataset](https://drive.google.com/file/d/1fa2MyLdN1vcA7Rysk8kLQENE92YejS9B/view?usp=drive_link) for model training. In addition, we used [USPTO_50k dataset](https://yzhang.hpc.nyu.edu/T5Chem/index.html)'s test split to prevent data leakage.
+The command used for training is the following. For more information about data preprocessing and training, please refer to the paper and GitHub repository.
 ```python
+cd task_retrosynthesis
+python train.py \
+    --output_dir='t5' \
     --epochs=80 \
     --lr=2e-4 \
     --batch_size=32 \
     --evaluation_strategy='epoch' \
     --save_strategy='epoch' \
     --logging_strategy='epoch' \
+    --train_data_path='../data/preprocessed_ord_train.csv' \
+    --valid_data_path='../data/preprocessed_ord_valid.csv' \
+    --test_data_path='../data/preprocessed_ord_test.csv' \
+    --USPTO_test_data_path='../data/USPTO_50k/test.csv' \
     --pretrained_model_name_or_path='sagawa/CompoundT5'
 ```