Spaces:

szymanowiczs
/

flash3d

Runtime error

App Files Files Community

lyndonzheng commited on Jun 7, 2024

Commit

0b13fd6

1 Parent(s): a19de17

update ui

Browse files

Files changed (2) hide show

app.py +25 -21
demo_examples/re10k_04.jpg +0 -0

app.py CHANGED Viewed

@@ -75,6 +75,10 @@ def main():
         gr.Markdown(
             """
             # Flash3D
             """
             )
         with gr.Row(variant="panel"):
@@ -96,7 +100,6 @@ def main():
                             './demo_examples/bedroom_01.png',
                             './demo_examples/kitti_02.png',
                             './demo_examples/kitti_03.png',
-                            './demo_examples/re10k_04.jpg',
                             './demo_examples/re10k_05.jpg',
                             './demo_examples/re10k_06.jpg',
                         ],
@@ -118,26 +121,27 @@ def main():
                             interactive=False
                         )
-        # gr.Markdown(
-        # """
-        #     ## Comments:
-        #     1. If you run the demo online, the first example you upload should take about 4.5 seconds (with preprocessing, saving and overhead), the following take about 1.5s.
-        #     2. The 3D viewer shows a .ply mesh extracted from a mix of 3D Gaussians. This is only an approximations and artefacts might show.
-        #     3. Known limitations include:
-        #     - a black dot appearing on the model from some viewpoints
-        #     - see-through parts of objects, especially on the back: this is due to the model performing less well on more complicated shapes
-        #     - back of objects are blurry: this is a model limiation due to it being deterministic
-        #     4. Our model is of comparable quality to state-of-the-art methods, and is **much** cheaper to train and run.
-        #     ## How does it work?
-        #     Splatter Image formulates 3D reconstruction as an image-to-image translation task. It maps the input image to another image,
-        #     in which every pixel represents one 3D Gaussian and the channels of the output represent parameters of these Gaussians, including their shapes, colours and locations.
-        #     The resulting image thus represents a set of Gaussians (almost like a point cloud) which reconstruct the shape and colour of the object.
-        #     The method is very cheap: the reconstruction amounts to a single forward pass of a neural network with only 2D operators (2D convolutions and attention).
-        #     The rendering is also very fast, due to using Gaussian Splatting.
-        #     Combined, this results in very cheap training and high-quality results.
-        #     For more results see the [project page](https://szymanowiczs.github.io/splatter-image) and the [CVPR article](https://arxiv.org/abs/2312.13150).
-        #     """
-        # )
         submit.click(fn=check_input_image, inputs=[input_image]).success(
             fn=preprocess,

         gr.Markdown(
             """
             # Flash3D
+            **Flash3D** [project page](https://www.robots.ox.ac.uk/~vgg/research/flash3d/)] is a fast, super efficient, trinable on a single GPU in a day for sence 3D reconstruction from a single image.
+            The model used in the demo was trained on only **RealEstate10k dataset on a single A6000 GPU within 1 day**.
+            Upload an image of a scene or click on one of the provided examples to see how the Flash3D does.
+            The 3D viewer will render a .ply scene exported from the 3D Gaussians, which is only an approximation.
             """
             )
         with gr.Row(variant="panel"):
                             './demo_examples/bedroom_01.png',
                             './demo_examples/kitti_02.png',
                             './demo_examples/kitti_03.png',
                             './demo_examples/re10k_05.jpg',
                             './demo_examples/re10k_06.jpg',
                         ],
                             interactive=False
                         )
+        gr.Markdown(
+        """
+            ## Comments:
+            1. If you run the demo online, the first example you upload should take about 25 seconds (with preprocessing, saving and overhead), the following take about 14s.
+            2. The 3D viewer shows a .ply mesh extracted from a mix of 3D Gaussians. This is only an approximations and artefacts might show.
+            3. Known limitations include:
+            - a black dot appearing on the model from some viewpoints
+            - while the multiple gaussians fill in resonable pixels to the invisible parts, the visual quality is still blurry.
+            4. It achieves state-of-the-art results when trained and tested on RealEstate10k., and is **much** cheaper to train and run.
+            5. When transferred to unseen datasets like NYU it outperforms competitors by a large margin.
+            6. More impressively, when transferred to KITTI, Flash3D achieves better PSNR than methods trained specifically on that dataset.
+            ## How does it work?
+            Given a single image I as input, Flash3D first estimates the metric depth D using a frozen off-the-shelf network.
+            Then, a ResNet50-like encoder–decoder network predicts a set of shape and appearance parameters P of K layers of Gaussians for every pixel u,
+            allowing unobserved and occluded surfaces to be modelled.
+            From these predicted components, the depth can be obtained by summing the predicted (positive) offsets δi with the predicted monocular depth D,
+            allowing the mean vector for every layer of Gaussians to be computed.
+            This strategy ensures that the layers are depth-ordered, encouraging the network to model occluded surfaces.
+            For more results see the [project page](https://www.robots.ox.ac.uk/~vgg/research/flash3d/).
+            """
+        )
         submit.click(fn=check_input_image, inputs=[input_image]).success(
             fn=preprocess,

demo_examples/re10k_04.jpg DELETED Viewed

Binary file (15.1 kB)