Question about the "in_silico_perturber.py" codes

#125
by jinbo1129 - opened

hello, thank for this great tool !!!
I try to do the in_silico_perturber for my own data. I look into the codes of in_silico_perturber, and for this:

        if (self.tokens_to_perturb != "all") and (self.perturb_type != "overexpress"):
            # minimum # genes needed for perturbation test
            min_genes = len(self.tokens_to_perturb)
            def if_has_tokens_to_perturb(example):
                return (len(set(example["input_ids"]).intersection(self.tokens_to_perturb))>min_genes)
            filtered_input_data = filtered_input_data.filter(if_has_tokens_to_perturb, num_proc=self.nproc)

Should ">min_genes" change to ">0" ?

I got 0 rows of filtered_input_data after running these codes.
Thanks !!!

Thank you for your interest in Geneformer! Please ensure you have pulled the updated version and check the documentation of the options with help(InSilicoPerturber). If you provide a list of genes to perturb, they will be perturbed as a group so there must be cells that express all genes in the list. If you want to perturb them one by one, you can either use the “all” option if it’s nearing the full list of genes in the cell, or you can run the in silico perturber for each individual gene if it’s a small list. This will parallelize by cells rather than by genes so that batches involve multiple cells and it will be more efficient.

Update: @jinbo1129 this should be >=min_genes though. I just updated this in the code. Thank you!

ctheodoris changed discussion status to closed

Thanks !!!

Sign up or log in to comment