arxiv:2304.00947

RePAST: Relative Pose Attention Scene Representation Transformer

Published on Apr 3, 2023

Authors:

Abstract

The Scene Representation Transformer (SRT) is a recent method to render novel views at interactive rates. Since SRT uses camera poses with respect to an arbitrarily chosen reference camera, it is not invariant to the order of the input views. As a result, SRT is not directly applicable to large-scale scenes where the reference frame would need to be changed regularly. In this work, we propose Relative Pose Attention SRT (RePAST): Instead of fixing a reference frame at the input, we inject pairwise relative camera pose information directly into the attention mechanism of the Transformers. This leads to a model that is by definition invariant to the choice of any global reference frame, while still retaining the full capabilities of the original method. Empirical results show that adding this invariance to the model does not lead to a loss in quality. We believe that this is a step towards applying fully latent transformer-based <PRE_TAG>rendering methods</POST_TAG> to large-scale scenes.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2304.00947 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2304.00947 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2304.00947 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.