Edit model card

ChemBERTaLM

A molecule generator model finetuned from ChemBERTa checkpoint. It was introduced in the paper, "Exploiting pretrained biochemical language models for targeted drug design", which has been accepted for publication in Bioinformatics Published by Oxford University Press and first released in this repository.

ChemBERTaLM is a RoBERTa model initialized with ChemBERTa checkpoint, and then, finetuned on the MOSES dataset which comprises a collection of drug-like compounds.

How to use

from transformers import RobertaForCausalLM, RobertaTokenizer, pipeline
tokenizer = RobertaTokenizer.from_pretrained("gokceuludogan/ChemBERTaLM")
model = RobertaForCausalLM.from_pretrained("gokceuludogan/ChemBERTaLM")
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
generator("", max_length=128, do_sample=True)
# Sample output
[{'generated_text': 'Cc1ccc(C(=O)N2CCN(C(=O)c3ccc(F)cc3)CC2)cc1'}]

Citation

@article{10.1093/bioinformatics/btac482,
    author = {Uludoğan, Gökçe and Ozkirimli, Elif and Ulgen, Kutlu O. and Karalı, Nilgün Lütfiye and Özgür, Arzucan},
    title = "{Exploiting Pretrained Biochemical Language Models for Targeted Drug Design}",
    journal = {Bioinformatics},
    year = {2022},
    doi = {10.1093/bioinformatics/btac482},
    url = {https://doi.org/10.1093/bioinformatics/btac482}
}
Downloads last month
79
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using gokceuludogan/ChemBERTaLM 1