ChemBERTaLM
A molecule generator model finetuned from ChemBERTa checkpoint. It was introduced in the paper, "Exploiting pretrained biochemical language models for targeted drug design", which has been accepted for publication in Bioinformatics Published by Oxford University Press and first released in this repository.
ChemBERTaLM is a RoBERTa model initialized with ChemBERTa checkpoint, and then, finetuned on the MOSES dataset which comprises a collection of drug-like compounds.
How to use
from transformers import RobertaForCausalLM, RobertaTokenizer, pipeline
tokenizer = RobertaTokenizer.from_pretrained("gokceuludogan/ChemBERTaLM")
model = RobertaForCausalLM.from_pretrained("gokceuludogan/ChemBERTaLM")
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
generator("", max_length=128, do_sample=True)
# Sample output
[{'generated_text': 'Cc1ccc(C(=O)N2CCN(C(=O)c3ccc(F)cc3)CC2)cc1'}]
Citation
@article{10.1093/bioinformatics/btac482,
author = {Uludoğan, Gökçe and Ozkirimli, Elif and Ulgen, Kutlu O. and Karalı, Nilgün Lütfiye and Özgür, Arzucan},
title = "{Exploiting Pretrained Biochemical Language Models for Targeted Drug Design}",
journal = {Bioinformatics},
year = {2022},
doi = {10.1093/bioinformatics/btac482},
url = {https://doi.org/10.1093/bioinformatics/btac482}
}
- Downloads last month
- 79
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.