--- license: cc-by-nc-4.0 --- Pretrained Weights of [NaVid](https://pku-epic.github.io/NaVid/): Video-based VLM Plans the Next Step for Vision-and-Language Navigation (RSS 2024) The model is trained on samples collected from the training splits of [VLN-CE](https://github.com/jacobkrantz/VLN-CE) R2R and RxR. | Evaliation Benchmark | TL | NE | OS | SR | SPL | |----------------------|:----:|:----:|:----:|:----:|:----:| | VLN-CE R2R Val. | 10.7 | 5.65 | 49.2 | 41.9 | 36.5 | | [VLN-CE R2R Test](https://eval.ai/web/challenges/challenge-page/719/leaderboard/1966) | 11.3 | 5.39 | 52 | 45 | 39 | | VLN-CE RxR Val. | 15.4 | 5.72 | 55.6 | 45.7 | 38.2 | The related inference code can be found in [here](https://github.com/jzhzhang/NaVid-VLN-CE)