Pratik Bhavsar commited on
Commit
712db9d
·
1 Parent(s): b9405c8

added more info

Browse files
Files changed (1) hide show
  1. data_loader.py +9 -11
data_loader.py CHANGED
@@ -1044,12 +1044,9 @@ METHODOLOGY = """
1044
  </div>
1045
 
1046
 
1047
- <div></div>
1048
- <h1 class="section-title">How Do We Measure Agent's Performance?</h1>
1049
- <div>
1050
- <p>The complexity of tool calling extends far beyond simple API invocations. We developed the Tool Selection Quality metric to assess agents' tool call performance, evaluating tool selection accuracy and effectiveness of parameter usage.</p>
1051
-
1052
  <div class="section-divider"></div>
 
 
1053
  <h2 class="subsection-title">Scenario Recognition</h2>
1054
  <div class="explanation-block">
1055
  <p>When an agent encounters a query, it must first determine if tool usage is warranted. Information may already exist in the conversation history, making tool calls redundant. Alternatively, available tools might be insufficient or irrelevant to the task, requiring the agent to acknowledge limitations rather than force inappropriate tool usage.</p>
@@ -1084,10 +1081,10 @@ METHODOLOGY = """
1084
  <li>Adapt to partial results or failures</li>
1085
  </ul>
1086
  </div>
1087
-
1088
- <div class="section-divider"></div>
1089
- <h2 class="subsection-title">Example code</h2>
1090
- <p class="code-intro">Below is the code example of evaluating an LLM for a dataset.</p>
1091
  <div class="code-block">
1092
 
1093
  <pre>
@@ -1106,14 +1103,15 @@ evaluate_handler = pq.GalileoPromptCallback(
1106
  scorers=[chainpoll_tool_selection_scorer],
1107
  )
1108
 
1109
- llm = llm_handler.get_llm(model, temperature=0.0, max_tokens=4000)
 
1110
  system_msg = {
1111
  "role": "system",
1112
  "content": 'Your job is to use the given tools to answer the query of human. If there is no relevant tool then reply with "I cannot answer the question with given tools". If tool is available but sufficient information is not available, then ask human to get the same. You can call as many tools as you want. Use multiple tools if needed. If the tools need to be called in a sequence then just call the first tool.',
1113
  }
1114
 
1115
  for row in df.itertuples():
1116
- chain = llm.bind_tools(tools)
1117
  outputs.append(
1118
  chain.invoke(
1119
  [system_msg, *row.conversation],
 
1044
  </div>
1045
 
1046
 
 
 
 
 
 
1047
  <div class="section-divider"></div>
1048
+ <h1 class="section-title">What Makes Tool Selection Hard?</h1>
1049
+ <div class="section-divider"></div>
1050
  <h2 class="subsection-title">Scenario Recognition</h2>
1051
  <div class="explanation-block">
1052
  <p>When an agent encounters a query, it must first determine if tool usage is warranted. Information may already exist in the conversation history, making tool calls redundant. Alternatively, available tools might be insufficient or irrelevant to the task, requiring the agent to acknowledge limitations rather than force inappropriate tool usage.</p>
 
1081
  <li>Adapt to partial results or failures</li>
1082
  </ul>
1083
  </div>
1084
+
1085
+ <div class="section-divider"></div>
1086
+ <h1 class="section-title">How Do We Measure Agent's Performance?</h1>
1087
+ <p class="code-intro">We developed the Tool Selection Quality metric to assess agents' tool call performance, evaluating tool selection accuracy and effectiveness of parameter usage. This is an example code for evaluating an LLM with a dataset with Galileo's Tool Selection Quality.</p>
1088
  <div class="code-block">
1089
 
1090
  <pre>
 
1103
  scorers=[chainpoll_tool_selection_scorer],
1104
  )
1105
 
1106
+ llm = llm_handler.get_llm(model, temperature=0.0, max_tokens=4000) # llm_handler is a custom handler for LLMs
1107
+
1108
  system_msg = {
1109
  "role": "system",
1110
  "content": 'Your job is to use the given tools to answer the query of human. If there is no relevant tool then reply with "I cannot answer the question with given tools". If tool is available but sufficient information is not available, then ask human to get the same. You can call as many tools as you want. Use multiple tools if needed. If the tools need to be called in a sequence then just call the first tool.',
1111
  }
1112
 
1113
  for row in df.itertuples():
1114
+ chain = llm.bind_tools(tools) # attach the tools
1115
  outputs.append(
1116
  chain.invoke(
1117
  [system_msg, *row.conversation],