README.md · fguzman82/MobileCLIP at 4f8f32163ce0769bbf3c72249e8bed917f36d953

metadata

license: apple-ascl

MobileCLIP CoreML Models

These are the CoreML models of MobileCLIP. For more details, refer to MobileCLIP on HuggingFace and MobileCLIP on GitHub.

The models are separated for each subarchitecture:

MobileCLIP-S0: This subarchitecture is designed for lightweight and fast inference, making it suitable for edge devices with limited computational resources.
MobileCLIP-S1: This subarchitecture offers a balance between model complexity and performance, providing a good trade-off for various applications.
MobileCLIP-S2: This subarchitecture focuses on achieving higher accuracy, ideal for applications where performance can be slightly compromised for better results.
MobileCLIP-B: This subarchitecture aims at delivering the highest possible accuracy, optimized for environments with ample computational resources.

Each subarchitecture contains a TextEncoder and ImageEncoder that are separated into CoreML models for each subarchitecture:

For detailed implementation and architecture specifics, refer to the MobileCLIP GitHub repository.

CoreML Parameters:

Model	Input Name	Input Shape	Input DataType	Output Name	Output Shape	Output DataType
CLIP Text	input_text	(1,77)	INT32	output_embeddings	(1,512)	FLOAT16

Model	Input Name	Input Width	Input Height	Input ColorSpace	Output Name	Output Shape	Output DataType
CLIP Image	input_image	256	256	RGB	output_embeddings	(1,512)	FLOAT16

These are example scripts for performing the conversion to CoreML

CLIPImageModel to CoreML
- This notebook demonstrates the process of converting a CLIP image model to CoreML format.
CLIPTextModel to CoreML
- This notebook demonstrates the process of converting a CLIP text model to CoreML format.