Error when running in_silico_perturber example
Hello,
I am trying to run the in_silico_perturber example, but I'm running into an error near the end when creating the InSilicoPerturberStats object. I have created a slight modification of the examples/in_silico_perturbation.ipynb script. Below I've put the code I'm running, the intermediate output files, and the error traceback. Thanks in advance for your time and attention!
The code is as follows:
new_in_silico_perturbation.py
from geneformer import InSilicoPerturber
from geneformer import InSilicoPerturberStats
from geneformer import EmbExtractor
from datasets import load_dataset, load_from_disk
print('1')
cell_states_to_model={"state_key": "disease",
"start_state": "dcm",
"goal_state": "nf",
"alt_states": ["hcm"]}
print('2')
filter_data_dict={"cell_type":["Cardiomyocyte1","Cardiomyocyte2","Cardiomyocyte3"]}
print('3')
embex = EmbExtractor(model_type="CellClassifier",
num_classes=3,
filter_data=filter_data_dict,
max_ncells=1000,
emb_layer=0,
summary_stat="exact_mean",
forward_batch_size=16, # 256
nproc=16)
print('4')
state_embs_dict = embex.get_state_embs(cell_states_to_model,
"../fine_tuned_models/geneformer-6L-30M_CellClassifier_cardiomyopathies_220224",
"../../Genecorpus-30M/example_input_files/cell_classification/disease_classification/human_dcm_hcm_nf.dataset",
"test_output",
"output_prefix")
print('5')
isp = InSilicoPerturber(perturb_type="delete",
perturb_rank_shift=None,
genes_to_perturb="all",
combos=0,
anchor_gene=None,
model_type="CellClassifier",
num_classes=3,
emb_mode="cell",
cell_emb_style="mean_pool",
filter_data=filter_data_dict,
cell_states_to_model=cell_states_to_model,
state_embs_dict=state_embs_dict,
max_ncells=500, #2000,
emb_layer=0,
forward_batch_size=16, # 400,
nproc=16)
print('6')
isp.perturb_data("../fine_tuned_models/geneformer-6L-30M_CellClassifier_cardiomyopathies_220224",
"../../Genecorpus-30M/example_input_files/cell_classification/disease_classification/human_dcm_hcm_nf.dataset",
"test_output",
"output_prefix")
print('7')
ispstats = InSilicoPerturberStats(mode="goal_state_shift",
genes_perturbed="all",
combos=0,
anchor_gene=None,
cell_states_to_model=cell_states_to_model)
print('8')
ispstats.get_stats("../../Genecorpus-30M/example_input_files/cell_classification/disease_classification/human_dcm_hcm_nf.dataset",
None,
"test_output",
"output_prefix")
The current contests of the test_output directory are:
$ ls test_output/
in_silico_delete_output_prefix_dict_cell_embs_1Kbatch0_raw.pickle output_prefix.pkl
in_silico_delete_output_prefix_dict_cell_embs_1Kbatch-1_raw.pickle
The error is:
Traceback (most recent call last):
File "new_in_silico_perturbation.py", line 61, in <module>
ispstats = InSilicoPerturberStats(mode="goal_state_shift",
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "in_silico_perturber_stats.py", line 708, in __init__
self.gene_name_id_dict = pickle.load(f)
^^^^^^^^^^^^^^
_pickle.UnpicklingError: invalid load key, 'v'.
Thank you for your interest in Geneformer! This error can sometimes occur when you have not actually downloaded the dictionary but only a pointer to the dictionary. This occurs if you don't have git lfs enabled. Please either enable git lfs before cloning the repository (see model card for instructions) or try downloading the token dictionary directly (e.g. wget the file by its download link) and placing it in your geneformer directory to test whether that resolves the issue.
Hello,
I am running the in_silico_perturber example as well and I was receiving the same error (Error: _pickle.UnpicklingError: invalid load key, 'v'.). I have ensured that git lfs is enabled and I have also used wget to directly download the token dictionary into my geneformer directory, but I am still receiving the same error. Are there any other suggestions to fix this error? Thank you!
Thank you for your question! I would check loading the file you downloaded with pickle.load - if it is able to be loaded, I would check that you have deleted the prior version and ensure the Geneformer modules are pointing to the right file. If it is not able to be loaded, wget should have worked, but you can always manually press the arrow button for download to the right of the file to download it that way.