--- tags: - safe - mamba - attention - hybrid - molecular-generation - smiles - generated_from_trainer datasets: - katielink/moses model-index: - name: HYBRID_20M results: [] --- # HYBRID_20M HYBRID_20M is a model developed for molecular generation tasks, incorporating both **Mamba** and **Attention** layers to utilize the advantages of each architecture. **The training code is available at [https://github.com/Anri-Lombard/Mamba-SAFE](https://github.com/Anri-Lombard/Mamba-SAFE).** The model was trained from scratch on the [MOSES](https://huggingface.co./datasets/katielink/moses) dataset, which has been converted from SMILES to the SAFE (SMILES Augmented For Encoding) format to improve molecular representation for machine learning applications. HYBRID_20M exhibits performance comparable to both transformer-based models such as [SAFE_20M](https://huggingface.co./anrilombard/safe-20m) and mamba-based models like [SSM_20M](https://huggingface.co./anrilombard/ssm-20m). ## Evaluation Results HYBRID_20M demonstrates performance that is on par with both transformer-based and mamba-based models in molecular generation tasks. The model ensures high validity and diversity in the generated molecular structures, indicating the effectiveness of combining Mamba's sequence modeling with Attention mechanisms. ## Model Description HYBRID_20M employs a hybrid architecture that integrates the **Mamba** framework with **Attention** layers. This integration allows the model to benefit from Mamba's efficient sequence modeling capabilities and the contextual understanding provided by Attention mechanisms. ### Mamba Framework The Mamba framework, utilized in HYBRID_20M, was introduced in the following publication: ```bibtex @article{gu2023mamba, title={Mamba: Linear-time sequence modeling with selective state spaces}, author={Gu, Albert and Dao, Tri}, journal={arXiv preprint arXiv:2312.00752}, year={2023} } ``` We acknowledge the authors for their contributions to sequence modeling. ### Attention Mechanisms Attention layers enhance the model's ability to focus on relevant parts of the input sequence, facilitating the capture of long-range dependencies and contextual information. This capability is essential for accurately generating complex molecular structures. ### SAFE Framework The SAFE framework, also employed in HYBRID_20M, was introduced in the following publication: ```bibtex @article{noutahi2024gotta, title={Gotta be SAFE: a new framework for molecular design}, author={Noutahi, Emmanuel and Gabellini, Cristian and Craig, Michael and Lim, Jonathan SC and Tossou, Prudencio}, journal={Digital Discovery}, volume={3}, number={4}, pages={796--804}, year={2024}, publisher={Royal Society of Chemistry} } ``` We acknowledge the authors for their contributions to molecular design. ## Intended Uses & Limitations ### Intended Uses HYBRID_20M is intended for: - **Generating Molecular Structures:** Creating novel molecules with desired properties. - **Exploring Chemical Space:** Investigating the vast array of possible chemical compounds for research and development. - **Assisting in Material Design:** Facilitating the creation of new materials with specific functionalities. ### Limitations - **Validation Required:** Outputs should be validated by domain experts before practical application. - **Synthetic Feasibility:** Generated molecules may not always be synthetically feasible. - **Dataset Scope:** The model's knowledge is limited to the chemical space represented in the MOSES dataset. ## Training and Evaluation Data The model was trained on the [MOSES (MOlecular SEtS)](https://huggingface.co./datasets/katielink/moses) dataset, a benchmark dataset for molecular generation. The dataset was converted from SMILES to the SAFE format to enhance molecular representation for machine learning tasks. ## Training Procedure ### Training Hyperparameters The following hyperparameters were used during training: - **Learning Rate:** 0.0005 - **Training Batch Size:** 32 - **Evaluation Batch Size:** 32 - **Seed:** 42 - **Gradient Accumulation Steps:** 2 - **Total Training Batch Size:** 64 - **Optimizer:** Adam (betas=(0.9, 0.999), epsilon=1e-08) - **Learning Rate Scheduler:** Linear with 20,000 warmup steps - **Number of Epochs:** 10 ### Framework Versions - **Mamba:** [Specify version] - **PyTorch:** [Specify version] - **Datasets:** 2.20.0 - **Tokenizers:** 0.19.1 ## Acknowledgements We acknowledge the authors of the [Mamba](https://github.com/Anri-Lombard/Mamba-SAFE) and SAFE frameworks for their contributions to sequence modeling and molecular design. ## References ```bibtex @article{gu2023mamba, title={Mamba: Linear-time sequence modeling with selective state spaces}, author={Gu, Albert and Dao, Tri}, journal={arXiv preprint arXiv:2312.00752}, year={2023} } @article{noutahi2024gotta, title={Gotta be SAFE: a new framework for molecular design}, author={Noutahi, Emmanuel and Gabellini, Cristian and Craig, Michael and Lim, Jonathan SC and Tossou, Prudencio}, journal={Digital Discovery}, volume={3}, number={4}, pages={796--804}, year={2024}, publisher={Royal Society of Chemistry} } ```