AmelieSchreiber
/

esm2_t12_35M_lora_binding_sites_v2_cp3

Token Classification

protein language model

Model card Files Files and versions Community

AmelieSchreiber commited on Sep 14, 2023

Commit

210a373

·

1 Parent(s): 7cf9c57

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ comprehensive as it could be (see [this report for more details](https://api.wan
 This model is a finetuned version of the 35M parameter `esm2_t12_35M_UR50D` ([see here](https://huggingface.co/facebook/esm2_t12_35M_UR50D)
 and [here](https://huggingface.co/docs/transformers/model_doc/esm) for more details). The model was finetuned with LoRA for
 the binay token classification task of predicting binding sites (and active sites) of protein sequences based on sequence alone.
-The model may be underfit and undertrained, however it still achieved better performance on the test set in terms of loss, accuracy,
 precision, recall, F1 score, ROC_AUC, and Matthews Correlation Coefficient (MCC) compared to the models trained on the smaller
 dataset [found here](https://huggingface.co/datasets/AmelieSchreiber/binding_sites_random_split_by_family) of ~209K protein sequences. Note,
 this model has a high recall, meaning it is likely to detect binding sites, but it has a low precision, meaning the model will likely return

 This model is a finetuned version of the 35M parameter `esm2_t12_35M_UR50D` ([see here](https://huggingface.co/facebook/esm2_t12_35M_UR50D)
 and [here](https://huggingface.co/docs/transformers/model_doc/esm) for more details). The model was finetuned with LoRA for
 the binay token classification task of predicting binding sites (and active sites) of protein sequences based on sequence alone.
+The model may need more training, however it still achieves better performance on the test set in terms of loss, accuracy,
 precision, recall, F1 score, ROC_AUC, and Matthews Correlation Coefficient (MCC) compared to the models trained on the smaller
 dataset [found here](https://huggingface.co/datasets/AmelieSchreiber/binding_sites_random_split_by_family) of ~209K protein sequences. Note,
 this model has a high recall, meaning it is likely to detect binding sites, but it has a low precision, meaning the model will likely return