Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters Paper • 2408.04093 • Published Aug 7 • 4