HaoyiZhu
/

SPA

Image Feature Extraction

representation-learning

spatial awareness

spatial intelligence

Model card Files Files and versions Community

SPA / README.md

HaoyiZhu's picture

Add link to paper (#2)

6870d2f verified 4 months ago

|

history blame contribute delete

1.76 kB

	---
	pipeline_tag: image-feature-extraction
	license: mit
	tags:
	- embodied-ai
	- representation-learning
	- spatial awareness
	- spatial intelligence
	---

	# Model Card for SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

	<!-- Provide a quick summary of what the model is/does. -->

	Pre-trained checkpoints of [SPA](https://haoyizhu.github.io/spa/).

	SPA is a novel representation learning framework that emphasizes the importance of 3D spatial awareness in embodied AI.
	It leverages differentiable neural rendering on multi-view images to endow a vanilla Vision Transformer (ViT) with
	intrinsic spatial understanding. We also present the most comprehensive evaluation of embodied representation learning to date,
	covering 268 tasks across 8 simulators with diverse policies in both single-task and language-conditioned multi-task scenarios.

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->



	- Developed by: [Haoyi Zhu](https://www.haoyizhu.site/)
	- Model type: Embodied AI Representation Learning
	- Encoder (Backbone) type: Vision Transformer (ViT)

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: [https://github.com/HaoyiZhu/SPA](https://github.com/HaoyiZhu/SPA)
	- Paper: [Hugging Face paper page](https://huggingface.co./papers/2410.08208)
	- Project Page: [https://haoyizhu.github.io/spa/](https://haoyizhu.github.io/spa/)

	## Citation
	```bib
	@article{zhu2024spa,
	title = {SPA: 3D Spatial-Awareness Enables Effective Embodied Representation},
	author = {Zhu, Haoyi and and Yang, Honghui and Wang, Yating and Yang, Jiange and Wang, Limin and He, Tong},
	journal = {arXiv preprint},
	year = {2024},
	}
	```