hybrid-20m / README.md

Upload README.md with huggingface_hub

af48d34 verified 25 days ago

5.24 kB

	---
	tags:
	- safe
	- mamba
	- attention
	- hybrid
	- molecular-generation
	- smiles
	- generated_from_trainer
	datasets:
	- katielink/moses
	model-index:
	- name: HYBRID_20M
	results: []
	---

	# HYBRID_20M

	HYBRID_20M is a model developed for molecular generation tasks, incorporating both Mamba and Attention layers to utilize the advantages of each architecture. The training code is available at [https://github.com/Anri-Lombard/Mamba-SAFE](https://github.com/Anri-Lombard/Mamba-SAFE). The model was trained from scratch on the [MOSES](https://huggingface.co./datasets/katielink/moses) dataset, which has been converted from SMILES to the SAFE (SMILES Augmented For Encoding) format to improve molecular representation for machine learning applications. HYBRID_20M exhibits performance comparable to both transformer-based models such as [SAFE_20M](https://huggingface.co./anrilombard/safe-20m) and mamba-based models like [SSM_20M](https://huggingface.co./anrilombard/ssm-20m).

	## Evaluation Results

	HYBRID_20M demonstrates performance that is on par with both transformer-based and mamba-based models in molecular generation tasks. The model ensures high validity and diversity in the generated molecular structures, indicating the effectiveness of combining Mamba's sequence modeling with Attention mechanisms.

	## Model Description

	HYBRID_20M employs a hybrid architecture that integrates the Mamba framework with Attention layers. This integration allows the model to benefit from Mamba's efficient sequence modeling capabilities and the contextual understanding provided by Attention mechanisms.

	### Mamba Framework

	The Mamba framework, utilized in HYBRID_20M, was introduced in the following publication:

	```bibtex
	@article{gu2023mamba,
	title={Mamba: Linear-time sequence modeling with selective state spaces},
	author={Gu, Albert and Dao, Tri},
	journal={arXiv preprint arXiv:2312.00752},
	year={2023}
	}
	```

	We acknowledge the authors for their contributions to sequence modeling.

	### Attention Mechanisms

	Attention layers enhance the model's ability to focus on relevant parts of the input sequence, facilitating the capture of long-range dependencies and contextual information. This capability is essential for accurately generating complex molecular structures.

	### SAFE Framework

	The SAFE framework, also employed in HYBRID_20M, was introduced in the following publication:

	```bibtex
	@article{noutahi2024gotta,
	title={Gotta be SAFE: a new framework for molecular design},
	author={Noutahi, Emmanuel and Gabellini, Cristian and Craig, Michael and Lim, Jonathan SC and Tossou, Prudencio},
	journal={Digital Discovery},
	volume={3},
	number={4},
	pages={796--804},
	year={2024},
	publisher={Royal Society of Chemistry}
	}
	```

	We acknowledge the authors for their contributions to molecular design.

	## Intended Uses & Limitations

	### Intended Uses

	HYBRID_20M is intended for:

	- Generating Molecular Structures: Creating novel molecules with desired properties.
	- Exploring Chemical Space: Investigating the vast array of possible chemical compounds for research and development.
	- Assisting in Material Design: Facilitating the creation of new materials with specific functionalities.

	### Limitations

	- Validation Required: Outputs should be validated by domain experts before practical application.
	- Synthetic Feasibility: Generated molecules may not always be synthetically feasible.
	- Dataset Scope: The model's knowledge is limited to the chemical space represented in the MOSES dataset.

	## Training and Evaluation Data

	The model was trained on the [MOSES (MOlecular SEtS)](https://huggingface.co./datasets/katielink/moses) dataset, a benchmark dataset for molecular generation. The dataset was converted from SMILES to the SAFE format to enhance molecular representation for machine learning tasks.

	## Training Procedure

	### Training Hyperparameters

	The following hyperparameters were used during training:

	- Learning Rate: 0.0005
	- Training Batch Size: 32
	- Evaluation Batch Size: 32
	- Seed: 42
	- Gradient Accumulation Steps: 2
	- Total Training Batch Size: 64
	- Optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
	- Learning Rate Scheduler: Linear with 20,000 warmup steps
	- Number of Epochs: 10

	### Framework Versions

	- Mamba: [Specify version]
	- PyTorch: [Specify version]
	- Datasets: 2.20.0
	- Tokenizers: 0.19.1

	## Acknowledgements

	We acknowledge the authors of the [Mamba](https://github.com/Anri-Lombard/Mamba-SAFE) and SAFE frameworks for their contributions to sequence modeling and molecular design.

	## References

	```bibtex
	@article{gu2023mamba,
	title={Mamba: Linear-time sequence modeling with selective state spaces},
	author={Gu, Albert and Dao, Tri},
	journal={arXiv preprint arXiv:2312.00752},
	year={2023}
	}

	@article{noutahi2024gotta,
	title={Gotta be SAFE: a new framework for molecular design},
	author={Noutahi, Emmanuel and Gabellini, Cristian and Craig, Michael and Lim, Jonathan SC and Tossou, Prudencio},
	journal={Digital Discovery},
	volume={3},
	number={4},
	pages={796--804},
	year={2024},
	publisher={Royal Society of Chemistry}
	}
	```