Questions about data split in hyperparam_optimiz_for_disease_classifier.py

#112
by Jiahaoszu - opened

Hey,
I found something strange in the code 'hyperparam_optimiz_for_disease_classifier.py' from line 70 - 75.
It seems that the 42 donors's ID are already in 'train_indiv' so no samples would be split into the eval sets.

Thank you for your interest in Geneformer! I changed the individual list to a set to ensure they are unique before subsetting into the train/valid/test sets. Please pull the updated version.

ctheodoris changed discussion status to closed

Sign up or log in to comment