Monocular vehicle pose and shape reconstruction with dual-branch scene-aware and adaptive feature fusion
Document Type
Article
Publication Date
3-15-2026
Abstract
Three-dimensional (3D) scene understanding plays a critical role in autonomous driving, enabling robust perception, decision-making, and trajectory planning. However, accurate estimation of vehicle pose and fine-grained 3D shape reconstruction from monocular images remains a challenging task, particularly in complex urban environments characterized by occlusion, clutter, and limited depth cues. To address these challenges, we propose a novel multi-task framework for monocular vehicle pose and shape reconstruction–DSA-MAF. The proposed method incorporates a multi-feature adaptive fusion module that dynamically re-weights hierarchical features based on scene complexity and object scale, enhancing feature discriminability and robustness under diverse conditions. A dual-branch context attention module is designed to jointly model global scene semantics and object-level geometry, enabling depth-aware contextual reasoning that significantly improves translation and rotation estimation in occluded or ambiguous scenes. Furthermore, a geometry prior guided module integrates object features with mesh priors to enforce structural plausibility and enhance the fidelity of 3D shape reconstruction. Extensive experiments on the 3D car instance understanding dataset(ApolloCar3D) demonstrate that DSA-MAF outperforms existing state-of-the-art methods, achieving improvements of 3.15% and 2.62% in the absolute 3D pose (A3DP-Abs) and relative 3D pose (A3DP-Rel) metrics, respectively. Ablation experiments further confirm the significance and necessity of each component within the framework.
Publication Title
Digital Signal Processing A Review Journal
Recommended Citation
Ye, P.,
Han, Q.,
Li, Z.,
Cao, L.,
Weng, T.,
Tian, Y.,
Cao, P.,
Han, C.,
&
Cao, J.
(2026).
Monocular vehicle pose and shape reconstruction with dual-branch scene-aware and adaptive feature fusion.
Digital Signal Processing A Review Journal,
172.
http://doi.org/10.1016/j.dsp.2025.105866
Retrieved from: https://digitalcommons.mtu.edu/michigantech-p2/2296