[](#marigold-repurposing-diffusion-based-image-generators-for-monocular-depth-estimation)Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
==============================================================================================================================================================================

This model represents the official checkpoint of the paper titled "Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation".

[![Website](/prs-eth/marigold-v1-0/resolve/main/doc/badges/badge-website.svg)](https://marigoldmonodepth.github.io) [![GitHub](https://img.shields.io/github/stars/prs-eth/Marigold?style=default&label=GitHub%20%E2%98%85&logo=github)](https://github.com/prs-eth/Marigold) [![Paper](/prs-eth/marigold-v1-0/resolve/main/doc/badges/badge-pdf.svg)](https://arxiv.org/abs/2312.02145) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/12G8reD13DdpMie5ZQlaFNo2WCGeNUH-u?usp=sharing) [![Hugging Face Space](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/toshas/marigold) [![License](https://img.shields.io/badge/License-Apache--2.0-929292)](https://www.apache.org/licenses/LICENSE-2.0)

[Bingxin Ke](http://www.kebingxin.com/), [Anton Obukhov](https://www.obukhov.ai/), [Shengyu Huang](https://shengyuh.github.io/), [Nando Metzger](https://nandometzger.github.io/), [Rodrigo Caye Daudt](https://rcdaudt.github.io/), [Konrad Schindler](https://scholar.google.com/citations?user=FZuNgqIAAAAJ&hl=en)

We present Marigold, a diffusion model and associated fine-tuning protocol for monocular depth estimation. Its core principle is to leverage the rich visual knowledge stored in modern generative image models. Our model, derived from Stable Diffusion and fine-tuned with synthetic data, can zero-shot transfer to unseen data, offering state-of-the-art monocular depth estimation results.

[![teaser](/prs-eth/marigold-v1-0/resolve/main/doc/teaser_collage_transparant.png)](/prs-eth/marigold-v1-0/blob/main/doc/teaser_collage_transparant.png)

[](#-citation) Citation
---------------------------

    @InProceedings{ke2023repurposing,
          title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation},
          author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler},
          booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
          year={2024}
    }
    

[](#-license) License
-------------------------

This work is licensed under the Apache License, Version 2.0 (as defined in the [LICENSE](/prs-eth/marigold-v1-0/blob/main/LICENSE.txt)).

By downloading and using the code and model you agree to the terms in the [LICENSE](/prs-eth/marigold-v1-0/blob/main/LICENSE.txt).

[![License](https://img.shields.io/badge/License-Apache--2.0-929292)](https://www.apache.org/licenses/LICENSE-2.0)

## Model overview

`marigold-v1-0` is a diffusion model developed by the research team at prs-eth that has been fine-tuned for monocular depth estimation. It leverages the rich visual knowledge stored in modern generative image models, such as [Stable Diffusion](https://aimodels.fyi/models/huggingFace/stable-diffusion-v-1-4-original-compvis), to offer state-of-the-art results in this task. The team has also released a similar model, [marigold](https://aimodels.fyi/models/huggingFace/marigold-adirik), developed by adirik, which focuses on monocular depth estimation as well.

## Model inputs and outputs

The `marigold-v1-0` model takes a single image as input and outputs a depth map prediction for that image. This allows it to estimate the depth information of a scene from a single monocular image, without requiring additional sensor data.

### Inputs
- **Image**: A single image, which the model will use to estimate the depth map.

### Outputs
- **Depth Map**: A predicted depth map for the input image, which can be used to understand the 3D structure of the scene.

## Capabilities

The `marigold-v1-0` model excels at monocular depth estimation, leveraging its fine-tuning on synthetic data to achieve state-of-the-art results on unseen real-world data. By repurposing a diffusion-based image generation model, the researchers were able to tap into the rich visual knowledge encoded in these powerful models to improve depth prediction performance.

## What can I use it for?

The `marigold-v1-0` model could be useful for a variety of applications that require understanding the 3D structure of a scene from a single image, such as:

- **Robotics and autonomous systems**: Accurate depth estimation can enable robots and self-driving cars to better perceive and navigate their environments.
- **Augmented reality and virtual reality**: Depth information can be used to create more realistic and immersive experiences by properly occluding and placing virtual objects.
- **3D reconstruction**: The depth maps generated by the model can be used as input for 3D reconstruction pipelines to create 3D models of scenes.
- **Scene understanding**: Depth information can provide valuable cues for tasks like object detection, segmentation, and scene parsing.

## Things to try

One interesting aspect of the `marigold-v1-0` model is its ability to leverage the knowledge captured in diffusion-based image generation models. You could experiment with using the model to perform other vision tasks beyond depth estimation, such as:

- **Image-to-image translation**: Explore how the model's latent representation can be used to transform images in novel ways, like converting a daytime scene to nighttime.
- **Image inpainting**: Use the model's depth-aware understanding of scenes to fill in missing or occluded regions of an image in a more realistic way.
- **Multimodal applications**: Investigate how the model's depth estimation capabilities can be combined with language models to enable new multimodal applications, such as scene-aware image captioning.

The research team has also released the [marigold](https://aimodels.fyi/models/huggingFace/marigold-adirik) model, developed by adirik, which focuses on monocular depth estimation. Comparing the performance and capabilities of these two models could provide insight into the different approaches and tradeoffs in this area of research.