geetu040 commited on
Commit
a138d14
·
1 Parent(s): 2c50f26

update readme

Browse files
Files changed (1) hide show
  1. README.md +62 -72
README.md CHANGED
@@ -1,87 +1,77 @@
1
  ---
2
  license: apple-ascl
3
  pipeline_tag: depth-estimation
4
- library_name: depth-pro
5
  ---
6
 
7
- # Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
8
-
9
- ![Depth Pro Demo Image](https://github.com/apple/ml-depth-pro/raw/main/data/depth-pro-teaser.jpg)
10
-
11
- We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details. The predictions are metric, with absolute scale, without relying on the availability of metadata such as camera intrinsics. And the model is fast, producing a 2.25-megapixel depth map in 0.3 seconds on a standard GPU. These characteristics are enabled by a number of technical contributions, including an efficient multi-scale vision transformer for dense prediction, a training protocol that combines real and synthetic datasets to achieve high metric accuracy alongside fine boundary tracing, dedicated evaluation metrics for boundary accuracy in estimated depth maps, and state-of-the-art focal length estimation from a single image.
12
-
13
- Depth Pro was introduced in **[Depth Pro: Sharp Monocular Metric Depth in Less Than a Second](https://arxiv.org/abs/2410.02073)**, by *Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, and Vladlen Koltun*.
14
-
15
- The checkpoint in this repository is a reference implementation, which has been re-trained. Its performance is close to the model reported in the paper but does not match it exactly.
16
-
17
- ## How to Use
18
-
19
- Please, follow the steps in the [code repository](https://github.com/apple/ml-depth-pro) to set up your environment. Then you can download the checkpoint from the _Files and versions_ tab above, or use the `huggingface-hub` CLI:
20
 
 
21
  ```bash
22
- pip install huggingface-hub
23
- huggingface-cli download --local-dir checkpoints apple/DepthPro
24
  ```
25
 
26
- ### Running from commandline
27
-
28
- The code repo provides a helper script to run the model on a single image:
29
-
30
- ```bash
31
- # Run prediction on a single image:
32
- depth-pro-run -i ./data/example.jpg
33
- # Run `depth-pro-run -h` for available options.
34
- ```
35
-
36
- ### Running from Python
37
-
38
- ```python
39
  from PIL import Image
40
- import depth_pro
41
-
42
- # Load model and preprocessing transform
43
- model, transform = depth_pro.create_model_and_transforms()
44
- model.eval()
45
-
46
- # Load and preprocess an image.
47
- image, _, f_px = depth_pro.load_rgb(image_path)
48
- image = transform(image)
49
-
50
- # Run inference.
51
- prediction = model.infer(image, f_px=f_px)
52
- depth = prediction["depth"] # Depth in [m].
53
- focallength_px = prediction["focallength_px"] # Focal length in pixels.
54
  ```
55
 
56
- ### Evaluation (boundary metrics)
57
-
58
- Boundary metrics are implemented in `eval/boundary_metrics.py` and can be used as follows:
59
-
60
- ```python
61
- # for a depth-based dataset
62
- boundary_f1 = SI_boundary_F1(predicted_depth, target_depth)
63
-
64
- # for a mask-based dataset (image matting / segmentation)
65
- boundary_recall = SI_boundary_Recall(predicted_depth, target_mask)
66
  ```
67
 
68
-
69
- ## Citation
70
-
71
- If you find our work useful, please cite the following paper:
72
-
73
- ```bibtex
74
- @article{Bochkovskii2024:arxiv,
75
- author = {Aleksei Bochkovskii and Ama\"{e}l Delaunoy and Hugo Germain and Marcel Santos and
76
- Yichao Zhou and Stephan R. Richter and Vladlen Koltun}
77
- title = {Depth Pro: Sharp Monocular Metric Depth in Less Than a Second},
78
- journal = {arXiv},
79
- year = {2024},
80
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  ```
82
-
83
- ## Acknowledgements
84
-
85
- Our codebase is built using multiple opensource contributions, please see [Acknowledgements](https://github.com/apple/ml-depth-pro/blob/main/ACKNOWLEDGEMENTS.md) for more details.
86
-
87
- Please check the paper for a complete list of references and datasets used in this work.
 
1
  ---
2
  license: apple-ascl
3
  pipeline_tag: depth-estimation
 
4
  ---
5
 
6
+ # DepthPro: Monocular Depth Estimation
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
+ Install the required libraries:
9
  ```bash
10
+ pip install -q numpy pillow torch torchvision
11
+ pip install -q git+https://github.com/geetu040/transformers.git@depth-pro-projects#egg=transformers
12
  ```
13
 
14
+ Import the required libraries:
15
+ ```py
16
+ import requests
 
 
 
 
 
 
 
 
 
 
17
  from PIL import Image
18
+ import torch
19
+ import torch.nn as nn
20
+ import torch.nn.functional as F
21
+ from huggingface_hub import hf_hub_download
22
+ import matplotlib.pyplot as plt
23
+
24
+ # custom installation from this PR: https://github.com/huggingface/transformers/pull/34583
25
+ # !pip install git+https://github.com/geetu040/transformers.git@depth-pro-projects#egg=transformers
26
+ from transformers import DepthProConfig, DepthProImageProcessorFast, DepthProForDepthEstimation
 
 
 
 
 
27
  ```
28
 
29
+ Load the model and image processor:
30
+ ```py
31
+ checkpoint = "geetu040/DepthPro"
32
+ revision = "project"
33
+ image_processor = DepthProImageProcessorFast.from_pretrained(checkpoint, revision=revision)
34
+ model = DepthProForDepthEstimation.from_pretrained(checkpoint, revision=revision)
35
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
36
+ model = model.to(device)
 
 
37
  ```
38
 
39
+ Inference:
40
+ ```py
41
+ # inference
42
+
43
+ url = "https://huggingface.co/spaces/geetu040/DepthPro_Segmentation_Human/resolve/main/assets/examples/man_with_arms_open.jpg"
44
+
45
+ image = Image.open(requests.get(url, stream=True).raw)
46
+ image = image.convert("RGB")
47
+
48
+ # prepare image for the model
49
+ inputs = image_processor(images=image, return_tensors="pt")
50
+ inputs = {k: v.to(device) for k, v in inputs.items()}
51
+
52
+ with torch.no_grad():
53
+ outputs = model(**inputs)
54
+
55
+ # interpolate to original size
56
+ post_processed_output = image_processor.post_process_depth_estimation(
57
+ outputs, target_sizes=[(image.height, image.width)],
58
+ )
59
+
60
+ # visualize the prediction
61
+ depth = post_processed_output[0]["predicted_depth"]
62
+ depth = (depth - depth.min()) / depth.max()
63
+ depth = depth * 255.
64
+ depth = depth.detach().cpu().numpy()
65
+ depth = Image.fromarray(depth.astype("uint8"))
66
+
67
+ # visualize the prediction
68
+ fig, axes = plt.subplots(1, 2, figsize=(20, 20))
69
+ axes[0].imshow(image)
70
+ axes[0].set_title(f'Image {image.size}')
71
+ axes[0].axis('off')
72
+ axes[1].imshow(depth)
73
+ axes[1].set_title(f'Depth {depth.size}')
74
+ axes[1].axis('off')
75
+ plt.subplots_adjust(wspace=0, hspace=0)
76
+ plt.show()
77
  ```