Question about the "in_silico_perturber.py" codes
hello, thank for this great tool !!!
I try to do the in_silico_perturber for my own data. I look into the codes of in_silico_perturber, and for this:
if (self.tokens_to_perturb != "all") and (self.perturb_type != "overexpress"):
# minimum # genes needed for perturbation test
min_genes = len(self.tokens_to_perturb)
def if_has_tokens_to_perturb(example):
return (len(set(example["input_ids"]).intersection(self.tokens_to_perturb))>min_genes)
filtered_input_data = filtered_input_data.filter(if_has_tokens_to_perturb, num_proc=self.nproc)
Should ">min_genes" change to ">0" ?
I got 0 rows of filtered_input_data after running these codes.
Thanks !!!
Thank you for your interest in Geneformer! Please ensure you have pulled the updated version and check the documentation of the options with help(InSilicoPerturber). If you provide a list of genes to perturb, they will be perturbed as a group so there must be cells that express all genes in the list. If you want to perturb them one by one, you can either use the “all” option if it’s nearing the full list of genes in the cell, or you can run the in silico perturber for each individual gene if it’s a small list. This will parallelize by cells rather than by genes so that batches involve multiple cells and it will be more efficient.
Update: @jinbo1129 this should be >=min_genes though. I just updated this in the code. Thank you!
Thanks !!!