Generating 3D scenes is still a challenging task due to the lack of readily available scene data. Most existing methods only produce partial scenes and provide limited navigational freedom. We introduce a practical and scalable solution that uses 360° video as an intermediate scene representation, capturing the full-scene context and ensuring consistent visual content throughout the generation. We propose WorldPrompter, a generative pipeline that synthesizes traversable 3D scenes from text prompts. WorldPrompter incorporates a conditional 360° panoramic video generator, capable of producing a 128-frame video that simulates a person walking through and capturing a virtual environment. The resulting video is then reconstructed as Gaussian splats by a fast feedforward 3D reconstructor, enabling a true walkable experience within the 3D scene. Experiments demonstrate that our panoramic video generation model, trained with a mix of image and video data, achieves convincing spatial and temporal consistency for static scenes. This is validated by an average COLMAP matching rate of 94.6%, allowing for high-quality panoramic Gaussian splat reconstruction and improved navigation throughout the scene. Qualitative and quantitative results also show it outperforms the state-of-the-art 360° video generators and 3D scene generation models.
(top) We train a text-to-video model on a mix of 360° videos and images depicting in-the-wild environments. As it is challenging to avoid the person and camera equipment being visible in the video capture, we mask these elements out of the frame using a pretrained image segmentation model [Adobe 2025], and obtain the prompts from the video frames using LLaVA [Liu et al. 2023b] for the videos, and BLIP-2 [Li et al. 2023a] for images. (bottom) At inference time, a user supplies a text prompt to our text-to-video model, which produces a “walk-through” video of the scene, which we reconstruct into a 3D Gaussian splat representation using Long-LRM [Ziwen et al. 2024].
A modern apartment with an industrial-chic design, concrete walls, and steel-framed windows.
A candlelit medieval cathedral crypt with carved statues.
LucidDreamer
DreamScene360
LayerPano3D
Ours
LucidDreamer
DreamScene360
LayerPano3D
Ours
LucidDreamer
DreamScene360
LayerPano3D
Ours
LucidDreamer
DreamScene360
LayerPano3D
Ours