VishrutThoutam commited on
Commit
209ee0d
1 Parent(s): 8dd2291

Update readme

Browse files
Files changed (1) hide show
  1. README.md +29 -6
README.md CHANGED
@@ -7,18 +7,41 @@ extra_gated_fields:
7
  Specific date: date_picker
8
  I want to use this model for:
9
  type: select
10
- options:
11
- - Research
12
- - Education
13
- - label: Other
14
- value: other
15
  I agree to share generated sequences and associated data with authors before publishing: checkbox
16
  I agree not to file patents on any sequences generated by this model: checkbox
17
  I agree to use this model for non-commercial use ONLY: checkbox
 
 
 
18
  ---
19
 
20
  # MeMDLM: De Novo Membrane Protein Design with Masked Diffusion Language Models
21
 
22
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65bbea9a26c639b000501321/uWW6xnJZwQFWDS1QZNQTm.png)
23
 
24
- Masked Diffusion Language Models (MDLMs), introduced by Sahoo et al (arxiv.org/pdf/2406.07524), provide strong generative capabilities to BERT-style models. In this work, we pre-train and fine-tune ESM-2-150M on the MDLM objective to scaffold functional motifs while unconditionally generating realistic, high-quality membrane protein sequences.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  Specific date: date_picker
8
  I want to use this model for:
9
  type: select
10
+ options:
11
+ - Research
12
+ - Education
13
+ - label: Other
14
+ value: other
15
  I agree to share generated sequences and associated data with authors before publishing: checkbox
16
  I agree not to file patents on any sequences generated by this model: checkbox
17
  I agree to use this model for non-commercial use ONLY: checkbox
18
+ base_model:
19
+ - facebook/esm2_t30_150M_UR50D
20
+ pipeline_tag: fill-mask
21
  ---
22
 
23
  # MeMDLM: De Novo Membrane Protein Design with Masked Diffusion Language Models
24
 
25
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65bbea9a26c639b000501321/uWW6xnJZwQFWDS1QZNQTm.png)
26
 
27
+ Masked Diffusion Language Models (MDLMs), introduced by Sahoo et al (arxiv.org/pdf/2406.07524), provide strong generative capabilities to BERT-style models. In this work, we pre-train and fine-tune ESM-2-150M on the MDLM objective to scaffold functional motifs while unconditionally generating realistic, high-quality membrane protein sequences.
28
+
29
+ ## Model Usage
30
+
31
+ The MDLM model leverages an internal backbone model, which is a fine-tune of ESM2 (150M). This backbone model can be used through this repo:
32
+
33
+ ```python
34
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
35
+
36
+ tokenizer = AutoTokenizer.from_pretrained("ChatterjeeLab/MeMDLM")
37
+ model = AutoModelForMaskedLM.from_pretrained("ChatterjeeLab/MeMDLM")
38
+
39
+ input_sequence = "QMMALTFITYIGCGLSSIFLSVTLVILIQLCAALLLLNLIFLLDSWIALYnTRGFCIAVAVFLHYFLLVSFTWMGLEAFHMYLKFCIVGWGIPAVVVSIVLTISPDNYGidFCWINSNVVFYITVVGYFCVIFLLNVSMFIVVLVQLCRIKKKKQLGDL"
40
+
41
+ inputs = tokenizer(input_sequence, return_tensors="pt")
42
+ output = model(**inputs)
43
+
44
+ filled_protein_seq = tokenizer.decode(output.squeeze()) # contains the output protein sequence with filled mask tokens
45
+ ```
46
+
47
+ This backbone model can be integrated with the [MDLM formulation](https://github.com/kuleshov-group/mdlm) by setting the model backbone type to "hf_dit" and setting the HuggingFace Model ID to "ChatterjeeLab/MeMDLM"