Given a single-view monocular video, MotionStruct4D decomposes motion into rigid and non-rigid transformations, yielding high-quality 4D generation results viewable across arbitrary viewpoints and timestamps.
Monocular Video-to-4D generation faces the fundamental challenge of inferring plausible 3D geometry and motion from limited single-view inputs. We present MotionStruct4D, a novel approach that discovers and exploits the underlying motion structure of 3D Gaussian Splatting for high-quality video-to-4D generation.
Our key insight is that real-world motion can be effectively decomposed into coarse rigid transformations that capture principal movements, complemented by detailed non-rigid deformations that account for fine-grained details. MotionStruct4D introduces:
Our pipeline first identifies nearly rigid-body parts in a self-supervised way. We then decompose motion into rigid and non-rigid transformations and optimize the representation in a weighted dense-to-sparse manner.
MotionStruct4D successfully disentangles global rigid movements and local non-rigid deformations, ensuring both high rendering fidelity and consistent multi-view motion. Below we demonstrate the rendered novel views and their corresponding quasi-rigid part segmentations across multiple angles.
Render (15°)
Render (-75°)
Render (105°)
Render (195°)
Parts (15°)
Parts (-75°)
Parts (105°)
Parts (195°)
Render (15°)
Render (-75°)
Render (105°)
Render (195°)
Parts (15°)
Parts (-75°)
Parts (105°)
Parts (195°)
Render (15°)
Render (-75°)
Render (105°)
Render (195°)
Parts (15°)
Parts (-75°)
Parts (105°)
Parts (195°)
Render (15°)
Render (-75°)
Render (105°)
Render (195°)
Parts (15°)
Parts (-75°)
Parts (105°)
Parts (195°)
Existing video-to-4D datasets often suffer from limited object displacement and insufficient motion diversity. To address these limitations, we curated a challenging benchmark.
If you find our work useful, please consider citing:
@article{zhong2025motionstruct4d,
title={MotionStruct4D: Discovering Motion Structure of Gaussian Splatting for Video-to-4D Generation},
author={Zhong, Jia-Xing and Lu, Kai and Ye, Jiaojiao and Trigoni, Niki and Markham, Andrew},
journal={Under review}
}