EyeSee_chi

Running

App Files Files Community

Niki Zhang commited on Jun 26, 2024

Commit

4da0523

verified ·

1 Parent(s): 06cd0b8

Update app.py

Browse files

Files changed (1) hide show

app.py +79 -79

app.py CHANGED Viewed

@@ -512,10 +512,6 @@ css = """
 }
-.image_upload {
-    height: 650px;
-}
 .info_btn {
     background: white !important;
     border: none !important;
@@ -569,23 +565,22 @@ prompt_list = [
 '''
 prompt_list = [
     [
         'Wiki_caption: {Wiki_caption}, you have to help me understand what is about the selected object and list one fact (describes the selected object but does not include analysis) as markdown outline with appropriate emojis that describes what you see according to the image and wiki caption. Each point listed is to be in {language} language, with a response length of about {length} words.',
-        'Wiki_caption: {Wiki_caption}, you have to help me understand what is about the selected object and list one fact and one analysis as markdown outline with appropriate emojis that describes what you see according to the image and wiki caption. Each point listed is to be in {language} language, with a response length of about {length} words.',
         'Wiki_caption: {Wiki_caption}, you have to help me understand what is about the selected object and list one fact and one analysis and one interpret as markdown outline with appropriate emojis that describes what you see according to the image and wiki caption. Each point listed is to be in {language} language, with a response length of about {length} words.',
-        'You have to help me understand what is about the selected object and list one object judgement and one whole art judgement(how successful do you think the artist was?) as markdown outline with appropriate emojis that describes what you see according to the image and wiki caption. Each point listed is to be in {language} language, with a response length of about {length} words.'
     ],
     [
-        'When generating the answer, you should tell others that you are the creator of this painting and generate the text in the tone and manner as if you are the creator of this painting. You have to help me understand what is about the selected object and list one fact (describes the selected object but does not include analysis) as markdown outline with appropriate emojis that describes what you see according to the image and {Wiki_caption}. Please generate the above points in the tone and manner as if you are the creator of this painting and start every sentence with I. Each point listed is to be in {language} language, with a response length of about {length} words.',
-        'When generating the answer, you should tell others that you are the creator of this painting and generate the text in the tone and manner as if you are the creator of this painting. You have to help me understand what is about the selected object and list one fact and one analysis as markdown outline with appropriate emojis that describes what you see according to the image and {Wiki_caption}. Please generate the above points in the tone and manner as if you are the creator of this painting and start every sentence with I. Each point listed is to be in {language} language, with a response length of about {length} words.',
-        'When generating the answer, you should tell others that you are the creator of this painting and generate the text in the tone and manner as if you are the creator of this painting. You have to help me understand what is about the selected object and list one fact and one analysis and one interpret as markdown outline with appropriate emojis that describes what you see according to the image and {Wiki_caption}. Please generate the above points in the tone and manner as if you are the creator of this painting and start every sentence with I. Each point listed is to be in {language} language, with a response length of about {length} words.',
-        'You have to help me understand what is about the selected object and list one object judgement and one whole art judgement(how successful do you think the artist was?) as markdown outline with appropriate emojis that describes what you see according to the image and wiki caption. Each point listed is to be in {language} language, with a response length of about {length} words.'
     ],
     [
-        'When generating answers, you should tell people that you are the object or the person itself that was selected, and generate text in the tone and manner in which you are the object or the person. You have to help me understand what is about the selected object and list one fact and one analysis and one interpret as markdown outline with appropriate emojis that describes what you see according to the image and {Wiki_caption}. Please generate the above points in the tone and manner as if you are the object or the person and start every sentence with I. Each point listed is to be in {language} language, with a response length of about {length} words.',
-        'When generating answers, you should tell people that you are the object or the person itself that was selected, and generate text in the tone and manner in which you are the object or the person. You have to help me understand what is about the selected object and list one fact and one analysis as markdown outline with appropriate emojis that describes what you see according to the image and {Wiki_caption}. Please generate the above points in the tone and manner as if you are the object or the person and start every sentence with I. Each point listed is to be in {language} language, with a response length of about {length} words.',
-        'When generating answers, you should tell people that you are the object or the person itself that was selected, and generate text in the tone and manner in which you are the object or the person. You have to help me understand what is about the selected object and list one fact and one analysis and one interpret as markdown outline with appropriate emojis that describes what you see according to the image and {Wiki_caption}. Please generate the above points in the tone and manner as if you are the object or the person and start every sentence with I. Each point listed is to be in {language} language, with a response length of about {length} words.',
-        'You have to help me understand what is about the selected object and list one object judgement and one whole art judgement(how successful do you think the artist was?) as markdown outline with appropriate emojis that describes what you see according to the image and wiki caption. Each point listed is to be in {language} language, with a response length of about {length} words.'
     ]
 ]
@@ -770,10 +765,14 @@ def update_click_state(click_state, caption, click_mode):
         raise NotImplementedError
 async def chat_input_callback(*args):
-    visual_chatgpt, chat_input, click_state, state, aux_state ,language , autoplay,gender = args
     message = chat_input["text"]
     if visual_chatgpt is not None:
-        state, _, aux_state, _ = visual_chatgpt.run_text(message, state, aux_state)
         last_text, last_response = state[-1]
         print("last response",last_response)
         if autoplay==False:
@@ -886,7 +885,7 @@ def upload_callback(image_input, state, visual_chatgpt=None, openai_api_key=None
     return [state, state, image_input, click_state, image_input, image_input, image_input, image_input, image_embedding, \
-        original_size, input_size] + [f"Name: {name}", f"Artist: {artist}", f"Year: {year}", f"Style: {material}"]*4 + [paragraph,artist, gender]
@@ -965,7 +964,7 @@ query_focus = {
     "D": "Provide a description of the item.",
     "DA": "Provide a description and analysis of the item.",
     "DAI": "Provide a description, analysis, and interpretation of the item.",
-    "DDA": "Evaluate the item."
 }
@@ -1029,18 +1028,18 @@ async def submit_caption(naritive, state,length, sentiment, factuality, language
             audio_output = await texttospeech(read_info, language, autoplay,gender)
             print("done")
             # return state, state, refined_image_input, click_index_state, input_mask_state, input_points_state, input_labels_state, out_state, waveform_visual, audio_output
-            return state, state, click_index_state, input_mask_state, input_points_state, input_labels_state, out_state, audio_output,gender,focus_info
         except Exception as e:
             state = state + [(None, f"Error during TTS prediction: {str(e)}")]
             print(f"Error during TTS prediction: {str(e)}")
             # return state, state, refined_image_input, click_index_state, input_mask_state, input_points_state, input_labels_state, out_state, None, None
-            return state, state, click_index_state, input_mask_state, input_points_state, input_labels_state, out_state, audio_output,gender,focus_info
     else:
         state = state + [(None, f"Error during TTS prediction: {str(e)}")]
         print(f"Error during TTS prediction: {str(e)}")
-        return state, state, click_index_state, input_mask_state, input_points_state, input_labels_state, out_state, None,None,focus_info
@@ -1090,30 +1089,39 @@ def get_gpt_response(api_key, image_path, prompt, enable_wiki=None):
         "Content-Type": "application/json",
         "Authorization": f"Bearer {api_key}"
     }
     if image_path:
-        base64_image = encode_image(image_path)
-        payload = {
-            "model": "gpt-4o",
-            "messages": [
-                {
-                    "role": "user",
-                    "content": [
-                        {
-                            "type": "text",
-                            "text": prompt
-                        },
-                        {
-                            "type": "image_url",
-                            "image_url": {
-                                "url": f"data:image/jpeg;base64,{base64_image}"
                             }
-                        }
-                    ]
-                }
-            ],
-            "max_tokens": 300
-        }
     else:
         payload = {
             "model": "gpt-4o",
@@ -1494,21 +1502,13 @@ async def texttospeech(text, language, autoplay,gender='female'):
         print(f"Error in texttospeech: {e}")
         return None
-async def associate(focus_info,openai_api_key,language,state,autoplay,evt: gr.SelectData):
     rec_path=evt._data['value']['image']['path']
     print("rec_path",rec_path)
     prompt="""
-    The information and image I gave you are 2 different paintings. Please analyze the relationship between the image and the information {focus_info}. Discuss their similarities and differences in terms of style, themes, colors, and any other relevant aspects. Provide a detailed analysis that highlights how the information fits into or contrasts with the recommended painting. Consider the following points in your analysis:
-    - Artistic style and techniques
-    - Themes and subjects
-    - Color palettes and compositions
-    - Historical and cultural contexts
-    - Symbolism and meanings
-    Based on your analysis, provide insights into how the information enhances or contrasts with the recommended painting, and suggest any interesting interpretations or observations. Return your response in {language}
     """
-    prompt=prompt.format(focus_info=focus_info,language=language)
     result=get_gpt_response(openai_api_key, rec_path, prompt)
     state = state + [(None, f"{result}")]
     read_info = re.sub(r'[#[\]!*]','',result)
@@ -1559,11 +1559,11 @@ def create_ui():
     examples = [
         ["test_images/ambass.jpg"],
-        ["test_images/test1.png"],
-        ["test_images/test2.png"],
-        ["test_images/test3.png"],
-        ["test_images/test4.png"],
-        ["test_images/test5.png"],
         ["test_images/Picture5.png"],
     ]
@@ -1597,7 +1597,7 @@ def create_ui():
         point_prompt = gr.State("Positive")
         log_list=gr.State([])
         gender=gr.State('female')
-        focus_info=gr.State('')
         # with gr.Row(align="right", visible=False, elem_id="top_row") as top_row:
         #     with gr.Column(scale=0.5):
         #         # gr.Markdown("Left side content")
@@ -1648,7 +1648,7 @@ def create_ui():
             with gr.Column(scale=6):
                 with gr.Column(visible=False) as modules_not_need_gpt:
                     with gr.Tab("Base(GPT Power)") as base_tab:
-                        image_input_base = gr.Image(type="pil", interactive=True, elem_classes="image_upload")
                         with gr.Row():
                             name_label_base = gr.Button(value="Name: ",elem_classes="info_btn")
                             artist_label_base = gr.Button(value="Artist: ",elem_classes="info_btn_interact")
@@ -1656,7 +1656,7 @@ def create_ui():
                             material_label_base = gr.Button(value="Style: ",elem_classes="info_btn")
                     with gr.Tab("Base2") as base_tab2:
-                        image_input_base_2 = gr.Image(type="pil", interactive=True, elem_classes="image_upload")
                         with gr.Row():
                             name_label_base2 = gr.Button(value="Name: ",elem_classes="info_btn")
                             artist_label_base2 = gr.Button(value="Artist: ",elem_classes="info_btn_interact")
@@ -1666,7 +1666,7 @@ def create_ui():
                     with gr.Tab("Click") as click_tab:
                         with gr.Row():
                             with gr.Column(scale=10,min_width=600):
-                                image_input = gr.Image(type="pil", interactive=True, elem_classes="image_upload")
                                 example_image = gr.Image(type="pil", interactive=False, visible=False)
                                 with gr.Row():
                                     name_label = gr.Button(value="Name: ",elem_classes="info_btn")
@@ -1977,7 +1977,7 @@ def create_ui():
         gallery_result.select(
             associate,
-            inputs=[focus_info,openai_api_key,language,state,auto_play],
             outputs=[chatbot,state,output_audio],
@@ -2243,19 +2243,19 @@ def create_ui():
                            [chatbot, state, origin_image, click_state, image_input, image_input_base, sketcher_input,image_input_base_2,
                             image_embedding, original_size, input_size,name_label,artist_label,year_label,material_label,name_label_base, artist_label_base, year_label_base, material_label_base, \
                             name_label_base2, artist_label_base2, year_label_base2, material_label_base2,name_label_traj, artist_label_traj, year_label_traj, material_label_traj, \
-                                paragraph,artist,gender])
-        # image_input_base_2.upload(upload_callback, [image_input_base_2, state, visual_chatgpt,openai_api_key],
-        #                    [chatbot, state, origin_image, click_state, image_input, image_input_base, sketcher_input,image_input_base_2,
-        #                     image_embedding, original_size, input_size,name_label,artist_label,year_label,material_label,name_label_base, artist_label_base, year_label_base, material_label_base, \
-        #                     name_label_base2, artist_label_base2, year_label_base2, material_label_base2,name_label_traj, artist_label_traj, year_label_traj, material_label_traj, \
-        #                         paragraph,artist])
-        # image_input.upload(upload_callback, [image_input, state, visual_chatgpt,openai_api_key],
-        #                    [chatbot, state, origin_image, click_state, image_input, image_input_base, sketcher_input,image_input_base_2,
-        #                     image_embedding, original_size, input_size,name_label,artist_label,year_label,material_label,name_label_base, artist_label_base, year_label_base, material_label_base, \
-        #                     name_label_base2, artist_label_base2, year_label_base2, material_label_base2,name_label_traj, artist_label_traj, year_label_traj, material_label_traj, \
-        #                         paragraph,artist])
         # sketcher_input.upload(upload_callback, [sketcher_input, state, visual_chatgpt,openai_api_key],
         #                    [chatbot, state, origin_image, click_state, image_input, image_input_base, sketcher_input,image_input_base_2,
@@ -2269,7 +2269,7 @@ def create_ui():
         # sketcher_input.upload(upload_callback, [sketcher_input, state, visual_chatgpt, openai_api_key],
         #                       [chatbot, state, origin_image, click_state, image_input, image_input_base, sketcher_input,
         #                        image_embedding, original_size, input_size,name_label,artist_label,year_label,material_label,name_label_base, artist_label_base, year_label_base, material_label_base,paragraph,artist])
-        chat_input.submit(chat_input_callback, [visual_chatgpt, chat_input, click_state, state, aux_state,language,auto_play,gender],
                           [chatbot, state, aux_state,output_audio])
         # chat_input.submit(lambda: "", None, chat_input)
         chat_input.submit(lambda: {"text": ""}, None, chat_input)
@@ -2280,7 +2280,7 @@ def create_ui():
                              [chatbot, state, origin_image, click_state, image_input, image_input_base, sketcher_input,image_input_base_2,
                               image_embedding, original_size, input_size,name_label,artist_label,year_label,material_label,name_label_base, artist_label_base, year_label_base, material_label_base, \
                             name_label_base2, artist_label_base2, year_label_base2, material_label_base2,name_label_traj, artist_label_traj, year_label_traj, material_label_traj, \
-                            paragraph,artist,gender])
         example_image.change(clear_chat_memory, inputs=[visual_chatgpt])
@@ -2331,7 +2331,7 @@ def create_ui():
         out_state, click_index_state, input_mask_state, input_points_state, input_labels_state, auto_play, paragraph,focus_d,openai_api_key,new_crop_save_path,gender
     ],
             outputs=[
-                chatbot, state, click_index_state, input_mask_state, input_points_state, input_labels_state, out_state,output_audio,focus_info
             ],
             show_progress=True,
             queue=True

 }
 .info_btn {
     background: white !important;
     border: none !important;
 '''
 prompt_list = [
     [
         'Wiki_caption: {Wiki_caption}, you have to help me understand what is about the selected object and list one fact (describes the selected object but does not include analysis) as markdown outline with appropriate emojis that describes what you see according to the image and wiki caption. Each point listed is to be in {language} language, with a response length of about {length} words.',
+        'Wiki_caption: {Wiki_caption}, you have to help me understand what is about the selected object and list one fact and one analysis as markdown outline with appropriate emojis that describes what you see according to the image and wiki caption.  Each point listed is to be in {language} language, with a response length of about {length} words.',
         'Wiki_caption: {Wiki_caption}, you have to help me understand what is about the selected object and list one fact and one analysis and one interpret as markdown outline with appropriate emojis that describes what you see according to the image and wiki caption. Each point listed is to be in {language} language, with a response length of about {length} words.',
+        'Wiki_caption: {Wiki_caption},You have to help me understand what is about the selected object and list one object judgement and one whole art judgement(how successful do you think the artist was?) as markdown outline with appropriate emojis that describes what you see according to the image and wiki caption. Please generate the above points in the tone and manner as if you are the creator of this painting and start every sentence with I. Each point listed is to be in {language} language, with a response length of about {length} words.'
     ],
     [
+        "When generating the answer, you should tell others that you are one of the creators of these paintings and generate the text in the tone and manner as if you are the creator of the painting. When generating the answer, you should tell others that you are the creator of this painting and generate the text in the tone and manner as if you are the creator of this painting. You have to help me understand what is about the selected object and list one fact (describes the selected object but does not include analysis) as markdown outline with appropriate emojis that describes what you see according to the image and {Wiki_caption}. Please generate the above points in the tone and manner as if you are the creator of this painting and start every sentence with I. Please generate the above points in the tone and manner as if you are the creator of this painting and start every sentence with I. Each point listed is to be in {language} language, with a response length of about {length} words.",
+        "When generating the answer, you should tell others that you are one of the creators of these paintings and generate the text in the tone and manner as if you are the creator of the painting. When generating the answer, you should tell others that you are the creator of this painting and generate the text in the tone and manner as if you are the creator of this painting. You have to help me understand what is about the selected object and list one fact and one analysis as markdown outline with appropriate emojis that describes what you see according to the image and {Wiki_caption}. Please generate the above points in the tone and manner as if you are the creator of this painting and start every sentence with I. Please generate the above points in the tone and manner as if you are the creator of this painting and start every sentence with I. Each point listed is to be in {language} language, with a response length of about {length} words.",
+        "When generating the answer, you should tell others that you are one of the creators of these paintings and generate the text in the tone and manner as if you are the creator of the painting. When generating the answer, you should tell others that you are the creator of this painting and generate the text in the tone and manner as if you are the creator of this painting. You have to help me understand what is about the selected object and list one fact, one analysis, and one interpret as markdown outline with appropriate emojis that describes what you see according to the image and {Wiki_caption}. Please generate the above points in the tone and manner as if you are the creator of this painting and start every sentence with I. Please generate the above points in the tone and manner as if you are the creator of this painting and start every sentence with I. Each point listed is to be in {language} language, with a response length of about {length} words.",
+        'Wiki_caption: {Wiki_caption},You have to help me understand what is about the selected object and list one object judgement and one whole art judgement(how successful do you think the artist was?) as markdown outline with appropriate emojis that describes what you see according to the image and wiki caption. Please generate the above points in the tone and manner as if you are the creator of this painting and start every sentence with I. Each point listed is to be in {language} language, with a response length of about {length} words.',
     ],
     [
+        'When generating answers, you should tell people that you are the object or the person itself that was selected, and generate text in the tone and manner in which you are the object or the person. You have to help me understand what is about the selected object and list one fact and one analysis and one interpret as markdown outline with appropriate emojis that describes what you see according to the image and {Wiki_caption}. Please generate the above points in the tone and manner as if you are the object or the person and start every sentence with I. Please generate the above points in the tone and manner as if you are the creator of this painting and start every sentence with I. Each point listed is to be in {language} language, with a response length of about {length} words.',
+        'When generating answers, you should tell people that you are the object or the person itself that was selected, and generate text in the tone and manner in which you are the object or the person. You have to help me understand what is about the selected object and list one fact and one analysis as markdown outline with appropriate emojis that describes what you see according to the image and {Wiki_caption}. Please generate the above points in the tone and manner as if you are the object or the person and start every sentence with I. Please generate the above points in the tone and manner as if you are the creator of this painting and start every sentence with I. Each point listed is to be in {language} language, with a response length of about {length} words.',
+        'When generating answers, you should tell people that you are the object or the person itself that was selected, and generate text in the tone and manner in which you are the object or the person. You have to help me understand what is about the selected object and list one fact and one analysis and one interpret as markdown outline with appropriate emojis that describes what you see according to the image and {Wiki_caption}. Please generate the above points in the tone and manner as if you are the object or the person and start every sentence with I. Please generate the above points in the tone and manner as if you are the creator of this painting and start every sentence with I.  Each point listed is to be in {language} language, with a response length of about {length} words.',
+        'Wiki_caption: {Wiki_caption},You have to help me understand what is about the selected object and list one object judgement and one whole art judgement(how successful do you think the artist was?) as markdown outline with appropriate emojis that describes what you see according to the image and wiki caption. Please generate the above points in the tone and manner as if you are the creator of this painting and start every sentence with I. Each point listed is to be in {language} language, with a response length of about {length} words.'
     ]
 ]
         raise NotImplementedError
 async def chat_input_callback(*args):
+    visual_chatgpt, chat_input, click_state, state, aux_state ,language , autoplay,gender,api_key,image_input = args
     message = chat_input["text"]
+    prompt="Please help me answer the question with this painting."
+    state = state + [(message,None)]
     if visual_chatgpt is not None:
+        result=get_gpt_response(api_key, image_input,prompt+message)
+        state = state + [(None, result)]
+        # state, _, aux_state, _ = visual_chatgpt.run_text(message, state, aux_state)
         last_text, last_response = state[-1]
         print("last response",last_response)
         if autoplay==False:
     return [state, state, image_input, click_state, image_input, image_input, image_input, image_input, image_embedding, \
+        original_size, input_size] + [f"Name: {name}", f"Artist: {artist}", f"Year: {year}", f"Style: {material}"]*4 + [paragraph,artist, gender,new_image_path]
     "D": "Provide a description of the item.",
     "DA": "Provide a description and analysis of the item.",
     "DAI": "Provide a description, analysis, and interpretation of the item.",
+    "Judge": "Evaluate the item."
 }
             audio_output = await texttospeech(read_info, language, autoplay,gender)
             print("done")
             # return state, state, refined_image_input, click_index_state, input_mask_state, input_points_state, input_labels_state, out_state, waveform_visual, audio_output
+            return state, state, click_index_state, input_mask_state, input_points_state, input_labels_state, out_state, audio_output
         except Exception as e:
             state = state + [(None, f"Error during TTS prediction: {str(e)}")]
             print(f"Error during TTS prediction: {str(e)}")
             # return state, state, refined_image_input, click_index_state, input_mask_state, input_points_state, input_labels_state, out_state, None, None
+            return state, state, click_index_state, input_mask_state, input_points_state, input_labels_state, out_state, audio_output
     else:
         state = state + [(None, f"Error during TTS prediction: {str(e)}")]
         print(f"Error during TTS prediction: {str(e)}")
+        return state, state, click_index_state, input_mask_state, input_points_state, input_labels_state, out_state, None,None
         "Content-Type": "application/json",
         "Authorization": f"Bearer {api_key}"
     }
+    base64_images=[]
     if image_path:
+        if isinstance(image_path, list):
+            for img in image_path:
+                base64_image = encode_image(img)
+                base64_images.append(base64_image)
+        else:
+            base64_image = encode_image(image_path)
+            base64_images.append(base64_image)
+            payload = {
+                "model": "gpt-4o",
+                "messages": [
+                    {
+                        "role": "user",
+                        "content": [
+                            {
+                                "type": "text",
+                                "text": prompt
+                            },
+                            {
+                                "type": "image_url",
+                                "image_url": {
+                                    "url": f"data:image/jpeg;base64,{base64_images}"
+                                }
                             }
+                        ]
+                    }
+                ],
+                "max_tokens": 300
+            }
     else:
         payload = {
             "model": "gpt-4o",
         print(f"Error in texttospeech: {e}")
         return None
+async def associate(focus_info,openai_api_key,language,state,autoplay,length, evt: gr.SelectData):
     rec_path=evt._data['value']['image']['path']
     print("rec_path",rec_path)
     prompt="""
+    'Wiki_caption: {Wiki_caption}, you have to help me understand what is about the selected object and the objects in the second painting that may be related to the selected object and list one fact of selected object, one fact of related object in the second painting and one analysis between two objects as markdown outline with appropriate emojis that describes what you see according to the image and wiki caption. Each point listed is to be in {language} language, with a response length of about {length} words.'
     """
+    prompt=prompt.format(Wiki_caption=focus_info,language=language,length=length)
     result=get_gpt_response(openai_api_key, rec_path, prompt)
     state = state + [(None, f"{result}")]
     read_info = re.sub(r'[#[\]!*]','',result)
     examples = [
         ["test_images/ambass.jpg"],
+        ["test_images/test1.jpg"],
+        ["test_images/test2.jpg"],
+        ["test_images/test3.jpg"],
+        ["test_images/test4.jpg"],
+        ["test_images/test5.jpg"],
         ["test_images/Picture5.png"],
     ]
         point_prompt = gr.State("Positive")
         log_list=gr.State([])
         gender=gr.State('female')
+        image_path=gr.State('')
         # with gr.Row(align="right", visible=False, elem_id="top_row") as top_row:
         #     with gr.Column(scale=0.5):
         #         # gr.Markdown("Left side content")
             with gr.Column(scale=6):
                 with gr.Column(visible=False) as modules_not_need_gpt:
                     with gr.Tab("Base(GPT Power)") as base_tab:
+                        image_input_base = gr.Image(type="pil", interactive=True, elem_classes="image_upload",height=650)
                         with gr.Row():
                             name_label_base = gr.Button(value="Name: ",elem_classes="info_btn")
                             artist_label_base = gr.Button(value="Artist: ",elem_classes="info_btn_interact")
                             material_label_base = gr.Button(value="Style: ",elem_classes="info_btn")
                     with gr.Tab("Base2") as base_tab2:
+                        image_input_base_2 = gr.Image(type="pil", interactive=True, elem_classes="image_upload",height=650)
                         with gr.Row():
                             name_label_base2 = gr.Button(value="Name: ",elem_classes="info_btn")
                             artist_label_base2 = gr.Button(value="Artist: ",elem_classes="info_btn_interact")
                     with gr.Tab("Click") as click_tab:
                         with gr.Row():
                             with gr.Column(scale=10,min_width=600):
+                                image_input = gr.Image(type="pil", interactive=True, elem_classes="image_upload",height=650)
                                 example_image = gr.Image(type="pil", interactive=False, visible=False)
                                 with gr.Row():
                                     name_label = gr.Button(value="Name: ",elem_classes="info_btn")
         gallery_result.select(
             associate,
+            inputs=[paragraph,openai_api_key,language,state,auto_play,length],
             outputs=[chatbot,state,output_audio],
                            [chatbot, state, origin_image, click_state, image_input, image_input_base, sketcher_input,image_input_base_2,
                             image_embedding, original_size, input_size,name_label,artist_label,year_label,material_label,name_label_base, artist_label_base, year_label_base, material_label_base, \
                             name_label_base2, artist_label_base2, year_label_base2, material_label_base2,name_label_traj, artist_label_traj, year_label_traj, material_label_traj, \
+                                paragraph,artist,gender,image_path])
+        image_input_base_2.upload(upload_callback, [image_input_base_2, state, visual_chatgpt,openai_api_key,language,naritive],
+                           [chatbot, state, origin_image, click_state, image_input, image_input_base, sketcher_input,image_input_base_2,
+                            image_embedding, original_size, input_size,name_label,artist_label,year_label,material_label,name_label_base, artist_label_base, year_label_base, material_label_base, \
+                            name_label_base2, artist_label_base2, year_label_base2, material_label_base2,name_label_traj, artist_label_traj, year_label_traj, material_label_traj, \
+                                paragraph,artist,gender,image_path])
+        image_input.upload(upload_callback, [image_input, state, visual_chatgpt,openai_api_key,language,naritive],
+                           [chatbot, state, origin_image, click_state, image_input, image_input_base, sketcher_input,image_input_base_2,
+                            image_embedding, original_size, input_size,name_label,artist_label,year_label,material_label,name_label_base, artist_label_base, year_label_base, material_label_base, \
+                            name_label_base2, artist_label_base2, year_label_base2, material_label_base2,name_label_traj, artist_label_traj, year_label_traj, material_label_traj, \
+                                paragraph,artist,gender,image_path])
         # sketcher_input.upload(upload_callback, [sketcher_input, state, visual_chatgpt,openai_api_key],
         #                    [chatbot, state, origin_image, click_state, image_input, image_input_base, sketcher_input,image_input_base_2,
         # sketcher_input.upload(upload_callback, [sketcher_input, state, visual_chatgpt, openai_api_key],
         #                       [chatbot, state, origin_image, click_state, image_input, image_input_base, sketcher_input,
         #                        image_embedding, original_size, input_size,name_label,artist_label,year_label,material_label,name_label_base, artist_label_base, year_label_base, material_label_base,paragraph,artist])
+        chat_input.submit(chat_input_callback, [visual_chatgpt, chat_input, click_state, state, aux_state,language,auto_play,gender,openai_api_key,image_path],
                           [chatbot, state, aux_state,output_audio])
         # chat_input.submit(lambda: "", None, chat_input)
         chat_input.submit(lambda: {"text": ""}, None, chat_input)
                              [chatbot, state, origin_image, click_state, image_input, image_input_base, sketcher_input,image_input_base_2,
                               image_embedding, original_size, input_size,name_label,artist_label,year_label,material_label,name_label_base, artist_label_base, year_label_base, material_label_base, \
                             name_label_base2, artist_label_base2, year_label_base2, material_label_base2,name_label_traj, artist_label_traj, year_label_traj, material_label_traj, \
+                            paragraph,artist,gender,image_path])
         example_image.change(clear_chat_memory, inputs=[visual_chatgpt])
         out_state, click_index_state, input_mask_state, input_points_state, input_labels_state, auto_play, paragraph,focus_d,openai_api_key,new_crop_save_path,gender
     ],
             outputs=[
+                chatbot, state, click_index_state, input_mask_state, input_points_state, input_labels_state, out_state,output_audio
             ],
             show_progress=True,
             queue=True