Spaces:

galileo-ai
/

agent-leaderboard

Running on CPU Upgrade

Pratik Bhavsar commited on 13 days ago

Commit

e2809a3

1 Parent(s): 91b3c5d

updated text

Files changed (1) hide show

data_loader.py CHANGED Viewed

@@ -354,7 +354,7 @@ DESCRIPTION_HTML = """
             color: var(--text-secondary);
         ">
             <div style="display: flex; gap: 8px; align-items: center;">
-                ✅ Accuracy Performance
             </div>
             <div style="display: flex; gap: 8px; align-items: center;">
                 💰 Open Vs Closed Source
@@ -363,16 +363,6 @@ DESCRIPTION_HTML = """
                 ⚖️ Overall Effectiveness
             </div>
         </div>
-        <div style="
-            border-left: 4px solid var(--accent-color, #4F46E5);
-            padding-left: 12px;
-            margin-top: 8px;
-            color: var(--text-secondary);
-            font-style: italic;
-        ">
-            💡 Use the filters below to explore different aspects of the evaluation and compare model performance across various dimensions.
-        </div>
     </div>
 </div>
 """
@@ -726,8 +716,8 @@ METHODOLOGY = """
     <h2 class="methodology-subtitle">Overview</h2>
     <p class="methodology-text">
-        The Berkeley Function Calling Leaderboard (BFCL) evaluates language models' ability to effectively use tools
-        and maintain coherent multi-turn conversations. Our evaluation focuses on both basic functionality and edge
         cases that challenge real-world applicability.
     </p>

             color: var(--text-secondary);
         ">
             <div style="display: flex; gap: 8px; align-items: center;">
+                ✅ Tool Selection Quality
             </div>
             <div style="display: flex; gap: 8px; align-items: center;">
                 💰 Open Vs Closed Source
                 ⚖️ Overall Effectiveness
             </div>
         </div>
     </div>
 </div>
 """
     <h2 class="methodology-subtitle">Overview</h2>
     <p class="methodology-text">
+        We evaluate language models' ability to effectively use tools
+        in single and multi-turn conversations. Our evaluation focuses on both basic functionality and edge
         cases that challenge real-world applicability.
     </p>