Papers
arxiv:2502.11537

M^{3}: A Modular World Model over Streams of Tokens

Published on Feb 17
Authors:
,
,
,
,

Abstract

Token-based world models emerged as a promising modular framework, modeling dynamics over token streams while optimizing tokenization separately. While successful in visual environments with discrete actions (e.g., Atari games), their broader applicability remains uncertain. In this paper, we introduce M^{3}, a modular world model that extends this framework, enabling flexible combinations of observation and action modalities through independent modality-specific components. M^{3} integrates several improvements from existing literature to enhance agent performance. Through extensive empirical evaluation across diverse benchmarks, M^{3} achieves state-of-the-art sample efficiency for planning-free world models. Notably, among these methods, it is the first to reach a human-level median score on Atari 100K, with superhuman performance on 13 games. We https://github.com/leor-c/M3{open-source our code and weights}.

Community

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.11537 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.11537 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.