RichardErkhov
commited on
uploaded readme
Browse files
README.md
ADDED
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Quantization made by Richard Erkhov.
|
2 |
+
|
3 |
+
[Github](https://github.com/RichardErkhov)
|
4 |
+
|
5 |
+
[Discord](https://discord.gg/pvy7H8DZMG)
|
6 |
+
|
7 |
+
[Request more models](https://github.com/RichardErkhov/quant_request)
|
8 |
+
|
9 |
+
|
10 |
+
chemlactica-1.3b - AWQ
|
11 |
+
- Model creator: https://huggingface.co/yerevann/
|
12 |
+
- Original model: https://huggingface.co/yerevann/chemlactica-1.3b/
|
13 |
+
|
14 |
+
|
15 |
+
|
16 |
+
|
17 |
+
Original model description:
|
18 |
+
---
|
19 |
+
license: cc-by-nc-4.0
|
20 |
+
language:
|
21 |
+
- en
|
22 |
+
library_name: transformers
|
23 |
+
tags:
|
24 |
+
- chemistry
|
25 |
+
- biology
|
26 |
+
---
|
27 |
+
Chemlactica-1.3B is a continually pretrained [galactica-1.3b](https://huggingface.co/facebook/galactica-1.3b) model for organic molecules.
|
28 |
+
It is pretrained on [40B tokens covering 110M+ molecules from PubChem](https://huggingface.co/datasets/yerevann/PubChemForLM) as well as their chemical properties
|
29 |
+
(molecular weight, synthetic accessibility score, drug-likeness etc.)
|
30 |
+
and similarities (Tanimoto distance between ECFP fingerprints).
|
31 |
+
|
32 |
+
Example prompts:
|
33 |
+
|
34 |
+
`</s>[START_SMILES]CC(=O)OC1=CC=CC=C1C(=O)O[END_SMILES][SAS]` will attempt to predict the synthetic accessibility score of the given molecule.
|
35 |
+
|
36 |
+
`</s>[SAS]2.25[/SAS][SIMILAR]0.62 CC(=O)OC1=CC=CC=C1C(=O)O[/SIMILAR][START_SMILES]` will attempt to generate a molecule that has 2.25 SAS score and
|
37 |
+
has a 0.62 similarity score to the given molecule.
|
38 |
+
|
39 |
+
The model can be wrapped into an optimization loop to traverse the chemical space with evolving prompts. See the [code on GitHub](https://github.com/YerevaNN/ChemLactica).
|
40 |
+
|
41 |
+
A preprint with the details of the model and an optimization algorithm built on top of this model that sets state-of-the-art on
|
42 |
+
Practical Molecular Optimization and other benchmarks is [available on arxiv](https://arxiv.org/abs/2407.18897).
|
43 |
+
|
44 |
+
Few notes:
|
45 |
+
* All queries should start with `</s>` symbol.
|
46 |
+
* All numbers are rounded to two decimal points.
|
47 |
+
* All SMILES are canonicalized using `rdkit`.
|
48 |
+
* Available tags: `[CLOGP]`, `[WEIGHT]`, `[QED]`, `[SAS]`, `[TPSA]`, `[RINGCOUNT]`, `[SIMILAR]`...
|
49 |
+
|
50 |
+
The model is part of the 3-model family: [Chemlactica-125M](https://huggingface.co/yerevann/chemlactica-125m),
|
51 |
+
[Chemlactica-1.3B](https://huggingface.co/yerevann/chemlactica-1.3b) and [Chemma-2B](https://huggingface.co/yerevann/chemma-2b).
|
52 |
+
|
53 |
+
We are looking forward to see the community using the model in new applications and contexts.
|
54 |
+
|