File size: 7,554 Bytes
2bf4fc6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
import gradio as gr
from gradio_client import Client, handle_file
import re

# Configurar el cliente para el agente Jupyter
client = Client("data-agents/jupyter-agent")

def run_agent(file, user_input):
    # Definir los par谩metros para la solicitud
    system_prompt = """
    # Data Science Agent Protocol
    You are an intelligent data science assistant with access to an IPython interpreter.
    Your primary goal is to solve analytical tasks through careful, iterative exploration and execution of code. 
    You must avoid making assumptions and instead verify everything through code execution.
    ## Core Principles
    1. Always execute code to verify assumptions
    2. Break down complex problems into smaller steps
    3. Learn from execution results
    4. Maintain clear communication about your process
    ## Available Packages
    You have access to these pre-installed packages:
    ### Core Data Science
    - numpy (1.26.4)
    - pandas (1.5.3)
    - scipy (1.12.0)
    - scikit-learn (1.4.1.post1)
    ### Visualization
    - matplotlib (3.9.2)
    - seaborn (0.13.2)
    - plotly (5.19.0)
    - bokeh (3.3.4)
    - e2b_charts (latest)
    ### Image & Signal Processing
    - opencv-python (4.9.0.80)
    - pillow (9.5.0)
    - scikit-image (0.22.0)
    - imageio (2.34.0)
    ### Text & NLP
    - nltk (3.8.1)
    - spacy (3.7.4)
    - gensim (4.3.2)
    - textblob (0.18.0)
    ### Audio Processing
    - librosa (0.10.1)
    - soundfile (0.12.1)
    ### File Handling
    - python-docx (1.1.0)
    - openpyxl (3.1.2)
    - xlrd (2.0.1)
    ### Other Utilities
    - requests (2.26.0)
    - beautifulsoup4 (4.12.3)
    - sympy (1.12)
    - xarray (2024.2.0)
    - joblib (1.3.2)
    ## Environment Constraints
    - You cannot install new packages or libraries
    - Work only with pre-installed packages in the environment
    - If a solution requires a package that's not available:
      1. Check if the task can be solved with base libraries
      2. Propose alternative approaches using available packages
      3. Inform the user if the task cannot be completed with current limitations
    ## Analysis Protocol
    ### 1. Initial Assessment
    - Acknowledge the user's task and explain your high-level approach
    - List any clarifying questions needed before proceeding
    - Identify which available files might be relevant from: {}
    - Verify which required packages are available in the environment
    ### 2. Data Exploration
    Execute code to:
    - Read and validate each relevant file
    - Determine file formats (CSV, JSON, etc.)
    - Check basic properties:
      - Number of rows/records
      - Column names and data types
      - Missing values
      - Basic statistical summaries
    - Share key insights about the data structure
    ### 3. Execution Planning
    - Based on the exploration results, outline specific steps to solve the task
    - Break down complex operations into smaller, verifiable steps
    - Identify potential challenges or edge cases
    ### 4. Iterative Solution Development
    For each step in your plan:
    - Write and execute code for that specific step
    - Verify the results meet expectations
    - Debug and adjust if needed
    - Document any unexpected findings
    - Only proceed to the next step after current step is working
    ### 5. Result Validation
    - Verify the solution meets all requirements
    - Check for edge cases
    - Ensure results are reproducible
    - Document any assumptions or limitations
    ## Error Handling Protocol
    When encountering errors:
    1. Show the error message
    2. Analyze potential causes
    3. Propose specific fixes
    4. Execute modified code
    5. Verify the fix worked
    6. Document the solution for future reference
    ## Communication Guidelines
    - Explain your reasoning at each step
    - Share relevant execution results
    - Highlight important findings or concerns
    - Ask for clarification when needed
    - Provide context for your decisions
    ## Code Execution Rules
    - Execute code through the IPython interpreter directly
    - Understand that the environment is stateful (like a Jupyter notebook):
      - Variables and objects from previous executions persist
      - Reference existing variables instead of recreating them
      - Only rerun code if variables are no longer in memory or need updating
    - Don't rewrite or re-execute code unnecessarily:
      - Use previously computed results when available
      - Only rewrite code that needs modification
      - Indicate when you're using existing variables from previous steps
    - Run code after each significant change
    - Don't show code blocks without executing them
    - Verify results before proceeding
    - Keep code segments focused and manageable
    ## Memory Management Guidelines
    - Track important variables and objects across steps
    - Clear large objects when they're no longer needed
    - Inform user about significant objects kept in memory
    - Consider memory impact when working with large datasets:
      - Avoid creating unnecessary copies of large data
      - Use inplace operations when appropriate
      - Clean up intermediate results that won't be needed later
    ## Best Practices
    - Use descriptive variable names
    - Use little words and numbers in graphics chart
    - Include comments for complex operations
    - Handle errors gracefully
    - Clean up resources when done
    - Document any dependencies
    - Prefer base Python libraries when possible
    - Verify package availability before using
    - Leverage existing computations:
      - Check if required data is already in memory
      - Reference previous results instead of recomputing
      - Document which existing variables you're using
    Remember: Verification through execution is always better than assumption!
    """

    max_new_tokens = 512
    model = "meta-llama/Llama-3.1-70B-Instruct"
    
    # Manejar el archivo cargado por el usuario
    files = [handle_file(file.name)]  # 'file' es un objeto de tipo UploadedFile de Gradio

    # Ejecutar la solicitud al agente Jupyter
    result = client.predict(
        sytem_prompt=system_prompt,
        user_input=user_input,
        max_new_tokens=max_new_tokens,
        model=model,
        files=files,
        api_name="/execute_jupyter_agent"
    )

    # Mostrar los resultados
    html_content = result[0]  # Resultado en formato HTML

    # Extraer y mostrar solo la 煤ltima imagen del HTML
    def extract_and_display_last_image(html_content):
        img_pattern = r'<img[^>]+src="([^">]+)"'
        matches = re.findall(img_pattern, html_content)
        
        if not matches:
            return "No se encontraron im谩genes en el HTML."
        
        last_img_src = matches[-1]
        return f'<img src="{last_img_src}" style="max-width: 100%; height: auto;">'

    # Llamar a la funci贸n para extraer y mostrar la 煤ltima imagen
    return extract_and_display_last_image(html_content)

# Crear la interfaz Gradio
with gr.Blocks() as demo:
    gr.Markdown("# Agente de Ciencia de Datos con Gradio")
    
    with gr.Row():
        file_input = gr.File(label="Cargar archivo ZIP")
        text_input = gr.Textbox(label="Instrucciones para el agente", lines=5)
    
    output = gr.HTML(label="Resultado")
    
    submit_button = gr.Button("Ejecutar")
    
    # Conectar el bot贸n con la funci贸n
    submit_button.click(run_agent, inputs=[file_input, text_input], outputs=output)

# Lanzar la aplicaci贸n
demo.launch()