File size: 1,170 Bytes
1756254
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
The repository is adapted based on: https://huggingface.co./chenxran/bart-smiles/tree/main


```python
from transformers import AutoTokenizer, AutoModel, SequenceFeatureExtractor
import torch
from transformers import AutoTokenizer, AutoModel

smiles = "CCC(=O)"

tokenizer = AutoTokenizer.from_pretrained("./BARTSmiles/", add_prefix_space=True)
inputs = tokenizer(smiles, return_tensors="pt", return_token_type_ids=False, add_special_tokens=True)

model = AutoModel.from_pretrained('./BARTSmiles')
model.eval()

# Use a pipeline as a high-level helper
from transformers import pipeline

extractor = pipeline("feature-extraction", model=model, tokenizer=tokenizer)
result = extractor(smiles, return_tensors=True, tokenize_kwargs={'return_token_type_ids':False})

```

## Citation

@article{chilingaryan2022bartsmiles,
  title={Bartsmiles: Generative masked language models for molecular representations},
  author={Chilingaryan, Gayane and Tamoyan, Hovhannes and Tevosyan, Ani and Babayan, Nelly and Khondkaryan, Lusine and Hambardzumyan, Karen and Navoyan, Zaven and Khachatrian, Hrant and Aghajanyan, Armen},
  journal={arXiv preprint arXiv:2211.16349},
  year={2022}
}