SpiritSight
Collection
SpiritSight Agent: Advanced GUI Agent with One Look
•
3 items
•
Updated
📄 Paper • 🤖 Models • 📚 Datasets (Coming soon…)
SpiritSight-Agent is a vision-based, end-to-end GUI agent that excels in GUI navigation tasks across various GUI platforms.
We recommend fine-tuning the base model on custom data.
Model | Checkpoint | Size | License |
---|---|---|---|
SpiritSight-Agent-2B-base | 🤗 HF Link | 2B | InternVL |
SpiritSight-Agent-8B-base | 🤗 HF Link | 8B | InternVL |
SpiritSight-Agent-26B-base | 🤗 HF Link | 26B | InternVL |
Coming soon.
conda create -n spiritsight-agent python=3.9
pip install -r requirements.txt
pip install flash-attn==2.3.6 --no-build-isolation
python infer_SSAgent-26B.py
If you find this repo useful for your research, please kindly cite our paper:
@misc{huang2025spiritsightagentadvancedgui,
title={SpiritSight Agent: Advanced GUI Agent with One Look},
author={Zhiyuan Huang and Ziming Cheng and Junting Pan and Zhaohui Hou and Mingjie Zhan},
year={2025},
eprint={2503.03196},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.03196},
}
We thank the following amazing projects that truly inspired us: