Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions

Model Card for Euclid-convnext-large (Version on 12/05/2024)

A multimodal large language models specifically trained for strong low-level geometric perception.

Model Details

Model Description

Euclid is trained on 1.6M synthetic geometry images with high-fidelity question-answer pairs using a curriculum learning approach.

It combines a ConvNeXt visual encoder with a Qwen-2.5 language model, connected through a 2-layer MLP multimodal connector.

Model Sources

Uses

The model is trained for precise low-level geometric perception tasks which is able to perform

  • Point-on-line detection
  • Point-on-circle detection
  • Angle classification
  • Length comparison
  • Geometric annotation understanding Please refer to our repo for full input format.

Limitations and Applications

Our model is not designed to handle:

  • Comprehensive image understanding tasks
  • Advanced cognitive reasoning beyond geometric analysis

However, the model demonstrates strength in low-level visual perception.

This capability makes it potentially valuable for serving as a base model for specialized downstream fintuning including:

  • Robotic vision and automation systems
  • Medical imaging and diagnostic support
  • Industrial quality assurance and inspection
  • Geometric education and visualization tools

Example Usage

Clone our Euclid repo first, set up the environment, then run:

pip install -U "huggingface_hub[cli]"
huggingface-cli download --cache-dir $MODEL_PATH EuclidAI/Euclid-convnext-large
python euclid/eval/run_euclid_geo.py --model_path $MODEL_PATH --device cuda

Evaluation Results

Performance on Geoperception benchmark tasks:

Model POL POC ALC LHC PEP PRA EQL Overall
Random Baseline 0.43 2.63 59.92 51.36 0.25 0.00 0.02 16.37
Pixtral-12B 22.85 53.21 47.33 51.43 22.53 37.11 58.45 41.84
Gemini-1.5-Pro 24.42 69.80 57.96 79.05 39.60 77.59 52.27 57.24
EUCLID-ConvNeXt-Large 80.54 57.76 86.37 88.24 42.23 64.94 34.45 64.93
EUCLID-ConvNeXt-XXLarge 82.98 61.45 90.56 90.82 46.96 70.52 31.94 67.89

Citation

If you find Euclid useful for your research and applications, please cite using this BibTeX:

@article{zhang2024euclid,
  title={Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions},
  author={Zhang, Jiarui and Liu, Ollie and Yu, Tianyu and Hu, Jinyi and Neiswanger, Willie},
  journal={arXiv preprint arXiv:2412.08737},
  year={2024}
}
Downloads last month
84
Safetensors
Model size
1.98B params
Tensor type
F32
BF16
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for euclid-multimodal/Euclid-convnext-large-120524

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(61)
this model