## Model overview

`marigold` is a diffusion model developed by adirik for monocular depth estimation. It uses a unique fine-tuning protocol to perform high-quality depth prediction from a single image. Compared to similar models like [`stylemc`](https://aimodels.fyi/models/replicate/stylemc-adirik), [`gfpgan`](https://aimodels.fyi/models/replicate/gfpgan-tencentarc), [`bunny-phi-2-siglip`](https://aimodels.fyi/models/replicate/bunny-phi-2-siglip-adirik), [`real-esrgan`](https://aimodels.fyi/models/replicate/real-esrgan-nightmareai), and [`realvisxl-v4.0`](https://aimodels.fyi/models/replicate/realvisxl-v40-adirik), `marigold` focuses specifically on the task of monocular depth estimation.

## Model inputs and outputs

`marigold` takes an RGB or grayscale image as input and produces two depth map outputs - one grayscale and one spectral. The depth maps represent the estimated distance of each pixel from the camera, which can be useful for a variety of computer vision and 3D applications.

### Inputs
- **image:** RGB or grayscale input image for the model, use an RGB image for best results.
- **resize_input:** whether to resize the input image to max resolution of 768 x 768 pixels, default to `True`.
- **num_infer:** number of inferences to be performed. if >1, multiple depth predictions are ensembled. A higher number yields better results but runs slower.
- **denoise_steps:** number of inference denoising steps, more steps results in higher accuracy but slower inference speed.
- **regularizer_strength:** ensembling parameter, weight of optimization regularizer.
- **reduction_method:** ensembling parameter, method to merge aligned depth maps. Choose between `["mean", "median"]`.
- **max_iter:** ensembling parameter, max number of optimization iterations.
- **seed:** (optional) seed for reproducibility, set to random if left as `None`.

### Outputs
- **Two depth map images** - one grayscale and one spectral, representing the estimated distance of each pixel from the camera.

## Capabilities

`marigold` is capable of producing high-quality depth maps from a single input image. This can be useful for a variety of computer vision tasks such as 3D reconstruction, object detection and segmentation, and augmented reality applications.

## What can I use it for?

The depth maps generated by `marigold` can be used in a wide range of applications, such as:

- **3D reconstruction:** Combine multiple depth maps to create 3D models of scenes or objects.
- **Object detection and segmentation:** Use the depth information to better identify and localize objects in an image.
- **Augmented reality:** Integrate the depth maps into AR applications to create more realistic and immersive experiences.
- **Robotics and autonomous vehicles:** Use the depth information for tasks like obstacle avoidance, navigation, and scene understanding.

## Things to try

One interesting thing to try with `marigold` is to experiment with the different ensembling parameters, such as `num_infer`, `denoise_steps`, `regularizer_strength`, and `reduction_method`. By adjusting these settings, you can find the optimal balance between inference speed and depth map quality for your specific use case.

Another idea is to combine the depth maps generated by `marigold` with other computer vision models, such as those for object detection or semantic segmentation. This can provide a richer understanding of the 3D structure of a scene and enable more advanced applications.