|
--- |
|
pipeline_tag: image-feature-extraction |
|
license: mit |
|
tags: |
|
- embodied-ai |
|
- representation-learning |
|
- spatial awareness |
|
- spatial intelligence |
|
--- |
|
|
|
# Model Card for SPA: 3D Spatial-Awareness Enables Effective Embodied Representation |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
Pre-trained checkpoints of [SPA](https://haoyizhu.github.io/spa/). |
|
|
|
SPA is a novel representation learning framework that emphasizes the importance of 3D spatial awareness in embodied AI. |
|
It leverages differentiable neural rendering on multi-view images to endow a vanilla Vision Transformer (ViT) with |
|
intrinsic spatial understanding. We also present the most comprehensive evaluation of embodied representation learning to date, |
|
covering 268 tasks across 8 simulators with diverse policies in both single-task and language-conditioned multi-task scenarios. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
|
|
- **Developed by:** [Haoyi Zhu](https://www.haoyizhu.site/) |
|
- **Model type:** Embodied AI Representation Learning |
|
- **Encoder (Backbone) type:** Vision Transformer (ViT) |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** [https://github.com/HaoyiZhu/SPA](https://github.com/HaoyiZhu/SPA) |
|
- **Paper:** [Hugging Face paper page](https://huggingface.co./papers/2410.08208) |
|
- **Project Page:** [https://haoyizhu.github.io/spa/](https://haoyizhu.github.io/spa/) |
|
|
|
## Citation |
|
```bib |
|
@article{zhu2024spa, |
|
title = {SPA: 3D Spatial-Awareness Enables Effective Embodied Representation}, |
|
author = {Zhu, Haoyi and and Yang, Honghui and Wang, Yating and Yang, Jiange and Wang, Limin and He, Tong}, |
|
journal = {arXiv preprint}, |
|
year = {2024}, |
|
} |
|
``` |
|
|