RichardErkhov commited on
Commit
c8ac8c9
·
verified ·
1 Parent(s): 61463a6

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ chemlactica-1.3b - AWQ
11
+ - Model creator: https://huggingface.co/yerevann/
12
+ - Original model: https://huggingface.co/yerevann/chemlactica-1.3b/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: cc-by-nc-4.0
20
+ language:
21
+ - en
22
+ library_name: transformers
23
+ tags:
24
+ - chemistry
25
+ - biology
26
+ ---
27
+ Chemlactica-1.3B is a continually pretrained [galactica-1.3b](https://huggingface.co/facebook/galactica-1.3b) model for organic molecules.
28
+ It is pretrained on [40B tokens covering 110M+ molecules from PubChem](https://huggingface.co/datasets/yerevann/PubChemForLM) as well as their chemical properties
29
+ (molecular weight, synthetic accessibility score, drug-likeness etc.)
30
+ and similarities (Tanimoto distance between ECFP fingerprints).
31
+
32
+ Example prompts:
33
+
34
+ `</s>[START_SMILES]CC(=O)OC1=CC=CC=C1C(=O)O[END_SMILES][SAS]` will attempt to predict the synthetic accessibility score of the given molecule.
35
+
36
+ `</s>[SAS]2.25[/SAS][SIMILAR]0.62 CC(=O)OC1=CC=CC=C1C(=O)O[/SIMILAR][START_SMILES]` will attempt to generate a molecule that has 2.25 SAS score and
37
+ has a 0.62 similarity score to the given molecule.
38
+
39
+ The model can be wrapped into an optimization loop to traverse the chemical space with evolving prompts. See the [code on GitHub](https://github.com/YerevaNN/ChemLactica).
40
+
41
+ A preprint with the details of the model and an optimization algorithm built on top of this model that sets state-of-the-art on
42
+ Practical Molecular Optimization and other benchmarks is [available on arxiv](https://arxiv.org/abs/2407.18897).
43
+
44
+ Few notes:
45
+ * All queries should start with `</s>` symbol.
46
+ * All numbers are rounded to two decimal points.
47
+ * All SMILES are canonicalized using `rdkit`.
48
+ * Available tags: `[CLOGP]`, `[WEIGHT]`, `[QED]`, `[SAS]`, `[TPSA]`, `[RINGCOUNT]`, `[SIMILAR]`...
49
+
50
+ The model is part of the 3-model family: [Chemlactica-125M](https://huggingface.co/yerevann/chemlactica-125m),
51
+ [Chemlactica-1.3B](https://huggingface.co/yerevann/chemlactica-1.3b) and [Chemma-2B](https://huggingface.co/yerevann/chemma-2b).
52
+
53
+ We are looking forward to see the community using the model in new applications and contexts.
54
+