Abstract
While monocular depth estimation has achieved significant progress, achieving generalized metric depth estimation for both narrow field-of-view (FoV) perspectives and 360° panoramas remains an unsolved challenge. Existing methods are often tailored to specific camera types and struggle to produce accurate metric depth that generalizes across diverse settings. This limitation stems from two key challenges: the inherent geometric discrepancy between perspective and panoramic cameras, and the scarcity of panoramic training data with metric annotations.
In this work, we reformulate and introduce DepthMaster, a unified metric depth estimation framework for both perspective and panoramic images. Instead of directly learning geometric distortions, we decompose any panoramic image into a set of overlapped perspective patches. This strategy simultaneously resolves geometric differences by unifying all inputs into a canonical perspective representation, and mitigates data scarcity by leveraging metric depth priors from vast perspective-image datasets. Furthermore, a novel correspondence consistency loss is introduced to ensure metric and geometric consistency in overlapped regions.
Trained on a mixed dataset containing only one panorama dataset, DepthMaster achieves state-of-the-art zero-shot performance on 11 diverse datasets spanning both perspective and panoramic domains, demonstrating remarkable generalization. It outperforms not only universal methods but also leading specialist models. The code and models will be publicly available.
Demo · Panorama
A diverse gallery of panoramic 3D reconstructions produced by DepthMaster — indoor, outdoor, real-world and AI-generated 360° inputs.
Demo · Panorama
Side-by-side comparisons against leading methods on diverse 360° scenes (both indoor and outdoor). DepthMaster recovers more accurate geometry with fewer artifacts thanks to its perspective-patch decomposition.
Note: For outdoor scenes, since competing methods do not predict sky masks, their reconstructions contain a large number of outlier points (especially in sky regions). For a fairer visual comparison, we apply our predicted sky mask and additional depth-range thresholds to filter outliers from competing methods' point clouds.
● Scroll to zoom in/out
● Drag to rotate
● Press "shift" and drag to pan
Demo · Perspective
Beyond panoramic inputs, DepthMaster also excels on standard perspective images — realistic photographs and stylized artwork alike — highlighting the strong generalization of our unified model.
Demo · Perspective
Side-by-side comparisons against state-of-the-art monocular depth estimation methods on perspective images. DepthMaster produces sharper geometric details and more accurate metric depth.
● Scroll to zoom in/out
● Drag to rotate
● Press "shift" and drag to pan
● Click on the buttons at the top to switch texture color on/off
BibTeX
If you find DepthMaster useful in your research, please consider citing:
@article{wang2026depthmaster,
title = {DepthMaster: Unified Monocular Depth Estimation for Perspective and Panoramic Images},
author = {Wang, Pengfei and Wang, Shihao and Chen, Liyi and Ma, Zhiyuan and Zhang, Guowen and Zhang, Lei},
journal = {arXiv preprint arXiv:2606.12368},
year = {2026},
eprint = {2606.12368},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}