Spaces:

facebook
/

vggsfm

Running on Zero

App Files Files Community

JianyuanWang commited on Jun 26, 2024

Commit

1bfa5fd

1 Parent(s): 6eae011

update

Browse files

Files changed (3) hide show

app.py +9 -5
vggsfm_code/examples/videos/bonsai_video.mp4 +2 -2
vggsfm_code/examples/videos/british_museum_video.mp4 +2 -2

app.py CHANGED Viewed

@@ -185,6 +185,8 @@ cake_images = glob.glob(f'vggsfm_code/examples/cake/images/*')
 british_museum_images = glob.glob(f'vggsfm_code/examples/british_museum/images/*')
 with gr.Blocks() as demo:
     gr.Markdown("# 🎨 VGGSfM: Visual Geometry Grounded Deep Structure From Motion")
@@ -197,7 +199,7 @@ with gr.Blocks() as demo:
         <li>upload the images (.jpg, .png, etc.), or </li>
         <li>upload a video (.mp4, .mov, etc.) </li>
     </ul>
-    <p>The reconstruction should take <strong> up to 1 minute </strong>. If both images and videos are uploaded, the demo will only reconstruct the uploaded images. By default, we extract one image frame per second from the input video. To prevent crashes on the Hugging Face space, we currently limit reconstruction to the first 20 image frames. </p>
     <p>SfM methods are designed for <strong> rigid/static reconstruction </strong>. When dealing with dynamic/moving inputs, these methods may still work by focusing on the rigid parts of the scene. However, to ensure high-quality results, it is better to minimize the presence of moving objects in the input data. </p>
     <p>If you meet any problem, feel free to create an issue in our <a href="https://github.com/facebookresearch/vggsfm" target="_blank">GitHub Repo</a> ⭐</p>
     <p>(Please note that running reconstruction on Hugging Face space is slower than on a local machine.) </p>
@@ -208,7 +210,7 @@ with gr.Blocks() as demo:
         with gr.Column(scale=1):
             input_video = gr.Video(label="Input video", interactive=True)
             input_images = gr.File(file_count="multiple", label="Input Images", interactive=True)
-            num_query_images = gr.Slider(minimum=1, maximum=10, step=1, value=5, label="Number of query images",
                                          info="More query images usually lead to better reconstruction at lower speeds. If the viewpoint differences between your images are minimal, you can set this value to 1. ")
             num_query_points = gr.Slider(minimum=512, maximum=4096, step=1, value=1024, label="Number of query points",
                                          info="More query points usually lead to denser reconstruction at lower speeds.")
@@ -218,8 +220,10 @@ with gr.Blocks() as demo:
             log_output = gr.Textbox(label="Log")
     with gr.Row():
         clear_btn = gr.ClearButton([input_video, input_images, num_query_images, num_query_points, reconstruction_output, log_output], scale=1)
-        submit_btn = gr.Button("Reconstruct", scale=3)
     examples = [
@@ -232,7 +236,7 @@ with gr.Blocks() as demo:
                 inputs=[input_video, input_images, num_query_images, num_query_points],
                 outputs=[reconstruction_output, log_output],  # Provide outputs
                 fn=vggsfm_demo,  # Provide the function
-                cache_examples=True
                 )
     submit_btn.click(
@@ -243,7 +247,7 @@ with gr.Blocks() as demo:
     )
     # demo.launch(debug=True, share=True)
-    demo.queue(max_size=30).launch(show_error=True)
     # demo.queue(max_size=20, concurrency_count=1).launch(debug=True, share=True)
 ########################################################################################################################

 british_museum_images = glob.glob(f'vggsfm_code/examples/british_museum/images/*')
 with gr.Blocks() as demo:
     gr.Markdown("# 🎨 VGGSfM: Visual Geometry Grounded Deep Structure From Motion")
         <li>upload the images (.jpg, .png, etc.), or </li>
         <li>upload a video (.mp4, .mov, etc.) </li>
     </ul>
+    <p>The reconstruction should take <strong> up to 1 minute </strong>. If both images and videos are uploaded, the demo will only reconstruct the uploaded images. By default, we extract <strong> 1 image frame per second from the input video </strong>. To prevent crashes on the Hugging Face space, we currently limit reconstruction to the first 20 image frames. </p>
     <p>SfM methods are designed for <strong> rigid/static reconstruction </strong>. When dealing with dynamic/moving inputs, these methods may still work by focusing on the rigid parts of the scene. However, to ensure high-quality results, it is better to minimize the presence of moving objects in the input data. </p>
     <p>If you meet any problem, feel free to create an issue in our <a href="https://github.com/facebookresearch/vggsfm" target="_blank">GitHub Repo</a> ⭐</p>
     <p>(Please note that running reconstruction on Hugging Face space is slower than on a local machine.) </p>
         with gr.Column(scale=1):
             input_video = gr.Video(label="Input video", interactive=True)
             input_images = gr.File(file_count="multiple", label="Input Images", interactive=True)
+            num_query_images = gr.Slider(minimum=1, maximum=10, step=1, value=5, label="Number of query images (key frames)",
                                          info="More query images usually lead to better reconstruction at lower speeds. If the viewpoint differences between your images are minimal, you can set this value to 1. ")
             num_query_points = gr.Slider(minimum=512, maximum=4096, step=1, value=1024, label="Number of query points",
                                          info="More query points usually lead to denser reconstruction at lower speeds.")
             log_output = gr.Textbox(label="Log")
     with gr.Row():
+        submit_btn = gr.Button("Reconstruct", scale=1)
+        # submit_btn = gr.Button("Reconstruct", scale=1, elem_attributes={"style": "background-color: blue; color: white;"})
         clear_btn = gr.ClearButton([input_video, input_images, num_query_images, num_query_points, reconstruction_output, log_output], scale=1)
     examples = [
                 inputs=[input_video, input_images, num_query_images, num_query_points],
                 outputs=[reconstruction_output, log_output],  # Provide outputs
                 fn=vggsfm_demo,  # Provide the function
+                cache_examples=True,
                 )
     submit_btn.click(
     )
     # demo.launch(debug=True, share=True)
+    demo.queue(max_size=20).launch(show_error=True, share=True)
     # demo.queue(max_size=20, concurrency_count=1).launch(debug=True, share=True)
 ########################################################################################################################

vggsfm_code/examples/videos/bonsai_video.mp4 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:627cba512d70ff1ead2ba23e9e8492104934c42c6f2263665d39b72b24ea4d82
-size 2107907

 version https://git-lfs.github.com/spec/v1
+oid sha256:fe81a91e79e96b14bfea751f61da63e32f8f4e54879c68b726468a44f7f8818a
+size 2290807

vggsfm_code/examples/videos/british_museum_video.mp4 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7672a2df58075afe5a7415190daa11cfdcd740f9890d9f2ad7e5f35ae419ce6f
-size 419807

 version https://git-lfs.github.com/spec/v1
+oid sha256:4fbbde1a54deaadb5144a3bcecdd2c404fe950312f3b8f2b9628ba49067053df
+size 407548