anrilombard
/

hybrid-20m

+---
+tags:
+  - safe
+  - mamba
+  - attention
+  - hybrid
+  - molecular-generation
+  - smiles
+  - generated_from_trainer
+datasets:
+  - katielink/moses
+model-index:
+  - name: HYBRID_20M
+    results: []
+---
+# HYBRID_20M
+HYBRID_20M is a model developed for molecular generation tasks, incorporating both **Mamba** and **Attention** layers to utilize the advantages of each architecture. **The training code is available at [https://github.com/Anri-Lombard/Mamba-SAFE](https://github.com/Anri-Lombard/Mamba-SAFE).** The model was trained from scratch on the [MOSES](https://huggingface.co/datasets/katielink/moses) dataset, which has been converted from SMILES to the SAFE (SMILES Augmented For Encoding) format to improve molecular representation for machine learning applications. HYBRID_20M exhibits performance comparable to both transformer-based models such as [SAFE_20M](https://huggingface.co/anrilombard/safe-20m) and mamba-based models like [SSM_20M](https://huggingface.co/anrilombard/ssm-20m).
+## Evaluation Results
+HYBRID_20M demonstrates performance that is on par with both transformer-based and mamba-based models in molecular generation tasks. The model ensures high validity and diversity in the generated molecular structures, indicating the effectiveness of combining Mamba's sequence modeling with Attention mechanisms.
+## Model Description
+HYBRID_20M employs a hybrid architecture that integrates the **Mamba** framework with **Attention** layers. This integration allows the model to benefit from Mamba's efficient sequence modeling capabilities and the contextual understanding provided by Attention mechanisms.
+### Mamba Framework
+The Mamba framework, utilized in HYBRID_20M, was introduced in the following publication:
+```bibtex
+@article{gu2023mamba,
+  title={Mamba: Linear-time sequence modeling with selective state spaces},
+  author={Gu, Albert and Dao, Tri},
+  journal={arXiv preprint arXiv:2312.00752},
+  year={2023}
+}
+```
+We acknowledge the authors for their contributions to sequence modeling.
+### Attention Mechanisms
+Attention layers enhance the model's ability to focus on relevant parts of the input sequence, facilitating the capture of long-range dependencies and contextual information. This capability is essential for accurately generating complex molecular structures.
+### SAFE Framework
+The SAFE framework, also employed in HYBRID_20M, was introduced in the following publication:
+```bibtex
+@article{noutahi2024gotta,
+  title={Gotta be SAFE: a new framework for molecular design},
+  author={Noutahi, Emmanuel and Gabellini, Cristian and Craig, Michael and Lim, Jonathan SC and Tossou, Prudencio},
+  journal={Digital Discovery},
+  volume={3},
+  number={4},
+  pages={796--804},
+  year={2024},
+  publisher={Royal Society of Chemistry}
+}
+```
+We acknowledge the authors for their contributions to molecular design.
+## Intended Uses & Limitations
+### Intended Uses
+HYBRID_20M is intended for:
+- **Generating Molecular Structures:** Creating novel molecules with desired properties.
+- **Exploring Chemical Space:** Investigating the vast array of possible chemical compounds for research and development.
+- **Assisting in Material Design:** Facilitating the creation of new materials with specific functionalities.
+### Limitations
+- **Validation Required:** Outputs should be validated by domain experts before practical application.
+- **Synthetic Feasibility:** Generated molecules may not always be synthetically feasible.
+- **Dataset Scope:** The model's knowledge is limited to the chemical space represented in the MOSES dataset.
+## Training and Evaluation Data
+The model was trained on the [MOSES (MOlecular SEtS)](https://huggingface.co/datasets/katielink/moses) dataset, a benchmark dataset for molecular generation. The dataset was converted from SMILES to the SAFE format to enhance molecular representation for machine learning tasks.
+## Training Procedure
+### Training Hyperparameters
+The following hyperparameters were used during training:
+- **Learning Rate:** 0.0005
+- **Training Batch Size:** 32
+- **Evaluation Batch Size:** 32
+- **Seed:** 42
+- **Gradient Accumulation Steps:** 2
+- **Total Training Batch Size:** 64
+- **Optimizer:** Adam (betas=(0.9, 0.999), epsilon=1e-08)
+- **Learning Rate Scheduler:** Linear with 20,000 warmup steps
+- **Number of Epochs:** 10
+### Framework Versions
+- **Mamba:** [Specify version]
+- **PyTorch:** [Specify version]
+- **Datasets:** 2.20.0
+- **Tokenizers:** 0.19.1
+## Acknowledgements
+We acknowledge the authors of the [Mamba](https://github.com/Anri-Lombard/Mamba-SAFE) and SAFE frameworks for their contributions to sequence modeling and molecular design.
+## References
+```bibtex
+@article{gu2023mamba,
+  title={Mamba: Linear-time sequence modeling with selective state spaces},
+  author={Gu, Albert and Dao, Tri},
+  journal={arXiv preprint arXiv:2312.00752},
+  year={2023}
+}
+@article{noutahi2024gotta,
+  title={Gotta be SAFE: a new framework for molecular design},
+  author={Noutahi, Emmanuel and Gabellini, Cristian and Craig, Michael and Lim, Jonathan SC and Tossou, Prudencio},
+  journal={Digital Discovery},
+  volume={3},
+  number={4},
+  pages={796--804},
+  year={2024},
+  publisher={Royal Society of Chemistry}
+}
+```