Stereo matching has been a pivotal component in 3D vision, aiming to find corresponding points between pairs of stereo images to recover depth information. In this work, we introduce StereoAnything, a highly practical solution for robust stereo matching. Rather than focusing on a specialized model, our goal is to develop a versatile foundational model capable of handling stereo images across diverse environments. To this end, we scale up the dataset by collecting labeled stereo images and generating synthetic stereo pairs from unlabeled monocular images. To further enrich the model’s ability to generalize across different conditions, we introduce a novel synthetic dataset that complements existing data by adding variability in baselines, camera angles, and scene types. We extensively evaluate the zero-shot capabilities of our model on five public datasets, showcasing its impressive ability to generalize to new, unseen data.
Our Stereo Anything is trained on a combination set of 12 labeled datasets (1.3M+ images) and 5 unlabeled datasets (30M+ images), and tested on 5 labeled datasets.
To expand the diversity and quantity of existing stereo matching datasets, we utilized the CARLA simulator to collect new synthetic stereo data. Compared to previous stereo datasets, our approach offers more varied settings, providing different baselines and novel camera configurations that enhance the richness of stereo data.
We compare StereoAnything with SOTA methods on KITTI12, KITTI15, Middlebury, ETH3D, and DrivingStereo.
We conduct an ablation study that underscores the substantial impact of our proposed training strategy. Our results show that applying this strategy to various stereo backbones leads to significant performance improvements across all evaluated datasets.
The table below showcases the detailed mix setups for cross-domain evaluation.
The results across these benchmarks for each mix configuration are shown below.
@article{guo2024stereo,
title={Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data},
author={Guo, Xianda and Zhang, Chenming and Zhang, Youmin and Nie, Dujun and Wang, Ruilin and Zheng, Wenzhao and Poggi, Matteo and Chen, Long},
journal={arXiv preprint arXiv:2411.14053},
year={2024}
}