DepthMaster Unified Monocular Depth Estimation for Perspective and Panoramic Images

A single metric depth model for both narrow-FoV perspective images and 360° panoramas.

Pengfei Wang, Shihao Wang, Liyi Chen, Zhiyuan Ma, Guowen Zhang, Lei Zhang

The Hong Kong Polytechnic University

Corresponding author

Abstract

While monocular depth estimation has achieved significant progress, achieving generalized metric depth estimation for both narrow field-of-view (FoV) perspectives and 360° panoramas remains an unsolved challenge. Existing methods are often tailored to specific camera types and struggle to produce accurate metric depth that generalizes across diverse settings. This limitation stems from two key challenges: the inherent geometric discrepancy between perspective and panoramic cameras, and the scarcity of panoramic training data with metric annotations.

In this work, we reformulate and introduce DepthMaster, a unified metric depth estimation framework for both perspective and panoramic images. Instead of directly learning geometric distortions, we decompose any panoramic image into a set of overlapped perspective patches. This strategy simultaneously resolves geometric differences by unifying all inputs into a canonical perspective representation, and mitigates data scarcity by leveraging metric depth priors from vast perspective-image datasets. Furthermore, a novel correspondence consistency loss is introduced to ensure metric and geometric consistency in overlapped regions.

Trained on a mixed dataset containing only one panorama dataset, DepthMaster achieves state-of-the-art zero-shot performance on 11 diverse datasets spanning both perspective and panoramic domains, demonstrating remarkable generalization. It outperforms not only universal methods but also leading specialist models. The code and models will be publicly available.

DepthMaster pipeline
Overview of the DepthMaster pipeline: panorama → overlapped perspective patches → unified metric depth.

Panorama Comparison

Side-by-side comparisons against leading methods on diverse 360° scenes (both indoor and outdoor). DepthMaster recovers more accurate geometry with fewer artifacts thanks to its perspective-patch decomposition.

Note: For outdoor scenes, since competing methods do not predict sky masks, their reconstructions contain a large number of outlier points (especially in sky regions). For a fairer visual comparison, we apply our predicted sky mask and additional depth-range thresholds to filter outliers from competing methods' point clouds.

💡Tips

● Scroll to zoom in/out

● Drag to rotate

● Press "shift" and drag to pan

Perspective Comparison

Side-by-side comparisons against state-of-the-art monocular depth estimation methods on perspective images. DepthMaster produces sharper geometric details and more accurate metric depth.

💡Tips

● Scroll to zoom in/out

● Drag to rotate

● Press "shift" and drag to pan

● Click on the buttons at the top to switch texture color on/off

Citation

If you find DepthMaster useful in your research, please consider citing:

@article{wang2026depthmaster,
  title         = {DepthMaster: Unified Monocular Depth Estimation for Perspective and Panoramic Images},
  author        = {Wang, Pengfei and Wang, Shihao and Chen, Liyi and Ma, Zhiyuan and Zhang, Guowen and Zhang, Lei},
  journal       = {arXiv preprint arXiv:2606.12368},
  year          = {2026},
  eprint        = {2606.12368},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV}
}