kshitijthakkar commited on
Commit
1e21c93
·
1 Parent(s): cdba763

fix: Use ast.literal_eval() for MCP tool string returns and correct parameter names

Browse files

MCP tools return string representations of Python dicts (with single quotes),
not actual dict objects or JSON strings. This requires using ast.literal_eval()
to parse them safely.

Changes:
- Updated all examples to use ast.literal_eval() pattern
- Fixed parameter names: leaderboard_repo → repo
- Updated rule #4 to explain MCP tool return types
- Added defensive isinstance() checks to handle both strings and dicts

This fixes the TypeError: string indices must be integers error.

Files changed (1) hide show
  1. prompts/code_agent.yaml +28 -13
prompts/code_agent.yaml CHANGED
@@ -25,13 +25,15 @@ system_prompt: |-
25
  ---
26
  Task: "What are the top 3 performing models on the leaderboard and how much do they cost?"
27
 
28
- Thought: This is a "top N" query, so I should use the optimized `run_get_top_performers` tool instead of run_get_dataset to avoid loading all 51 runs (saves 90% tokens!). This tool returns a dict ready to use (no json.loads needed).
29
  ```python
30
- top_models_data = run_get_top_performers(
31
- leaderboard_repo="kshitijthakkar/smoltrace-leaderboard",
 
32
  metric="success_rate",
33
  top_n=3
34
  )
 
35
  print(f"Top 3 models by {top_models_data['metric_ranked_by']}:")
36
  for model in top_models_data['top_performers']:
37
  print(f" - {model['model']}: {model['success_rate']}% success, ${model['total_cost_usd']}/run")
@@ -69,20 +71,23 @@ system_prompt: |-
69
  ---
70
  Task: "Analyze the current leaderboard and show me the top performing models with their costs"
71
 
72
- Thought: This is an overview question about the leaderboard. I should use run_get_leaderboard_summary for high-level statistics (99% token reduction!), then run_get_top_performers for the top models with costs. This is much more efficient than loading all 51 runs with run_get_dataset. MCP tools return dicts ready to use.
73
  ```python
 
74
  # Get overview statistics
75
- summary_data = run_get_leaderboard_summary(
76
- leaderboard_repo="kshitijthakkar/smoltrace-leaderboard"
77
  )
 
78
  summary = summary_data['summary']
79
 
80
  # Get top 5 performers
81
- top_models_data = run_get_top_performers(
82
- leaderboard_repo="kshitijthakkar/smoltrace-leaderboard",
83
  metric="success_rate",
84
  top_n=5
85
  )
 
86
  top_models = top_models_data['top_performers']
87
 
88
  print(f"Leaderboard Overview:")
@@ -124,15 +129,17 @@ system_prompt: |-
124
  ---
125
  Task: "Create a synthetic dataset of 20 finance-related tasks for testing agents with stock price and ROI calculation tools"
126
 
127
- Thought: I will use the run_generate_synthetic_dataset tool to create domain-specific test tasks. I'll specify the finance domain, provide the tool names, and request 20 tasks with balanced difficulty. The tool returns a dict ready to use.
128
  ```python
129
- synthetic_result = run_generate_synthetic_dataset(
 
130
  domain="finance",
131
  tool_names="get_stock_price,calculate_roi,fetch_company_info",
132
  num_tasks=20,
133
  difficulty_distribution="balanced",
134
  agent_type="both"
135
  )
 
136
  print(f"Generated {synthetic_result['dataset_info']['num_tasks_generated']} tasks")
137
  print(f"Batches used: {synthetic_result['dataset_info']['num_batches']}")
138
  print(f"Difficulty distribution: {synthetic_result['dataset_info']['difficulty_distribution']}")
@@ -164,17 +171,19 @@ system_prompt: |-
164
  ---
165
  Task: "Generate 50 customer support tasks and upload them to HuggingFace as 'my-org/smoltrace-customer-support-tasks'"
166
 
167
- Thought: I'll first generate the synthetic dataset with 50 tasks, then use run_push_dataset_to_hub to upload it to HuggingFace. This will require multiple batches since 50 tasks exceeds the 20-task single-batch limit. MCP tools return dicts, so I need to convert to JSON string for push_dataset_to_hub.
168
  ```python
169
  import json
 
170
  # Step 1: Generate synthetic dataset
171
- synthetic_result = run_generate_synthetic_dataset(
172
  domain="customer_support",
173
  tool_names="search_knowledge_base,create_ticket,send_email,check_order_status",
174
  num_tasks=50,
175
  difficulty_distribution="progressive",
176
  agent_type="both"
177
  )
 
178
  print(f"Generated {synthetic_result['dataset_info']['num_tasks_generated']} tasks in {synthetic_result['dataset_info']['num_batches']} batches")
179
 
180
  # Step 2: Extract tasks array and convert to JSON string for push_dataset_to_hub
@@ -232,7 +241,13 @@ system_prompt: |-
232
  - For overview questions (e.g., "how many runs", "average success rate"): Use `run_get_leaderboard_summary()` (99% token savings!)
233
  - For leaderboard analysis with AI insights: Use `run_analyze_leaderboard()`
234
  - ONLY use `run_get_dataset()` for non-leaderboard datasets (traces, results, metrics)
235
- - **IMPORTANT**: All MCP tools return dict/list objects ready to use - DO NOT use json.loads()! Only use json.dumps() when you need to convert a dict to a JSON string (e.g., for push_dataset_to_hub).
 
 
 
 
 
 
236
  5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.
237
  6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'.
238
  7. Never create any notional variables in our code, as having these in your logs will derail you from the true variables.
 
25
  ---
26
  Task: "What are the top 3 performing models on the leaderboard and how much do they cost?"
27
 
28
+ Thought: This is a "top N" query, so I should use the optimized `run_get_top_performers` tool instead of run_get_dataset to avoid loading all 51 runs (saves 90% tokens!). MCP tools return string representations of dicts, so I need to use eval() to parse them.
29
  ```python
30
+ import ast
31
+ top_models_raw = run_get_top_performers(
32
+ repo="kshitijthakkar/smoltrace-leaderboard",
33
  metric="success_rate",
34
  top_n=3
35
  )
36
+ top_models_data = ast.literal_eval(top_models_raw) if isinstance(top_models_raw, str) else top_models_raw
37
  print(f"Top 3 models by {top_models_data['metric_ranked_by']}:")
38
  for model in top_models_data['top_performers']:
39
  print(f" - {model['model']}: {model['success_rate']}% success, ${model['total_cost_usd']}/run")
 
71
  ---
72
  Task: "Analyze the current leaderboard and show me the top performing models with their costs"
73
 
74
+ Thought: This is an overview question about the leaderboard. I should use run_get_leaderboard_summary for high-level statistics (99% token reduction!), then run_get_top_performers for the top models with costs. This is much more efficient than loading all 51 runs with run_get_dataset. MCP tools return string representations of dicts.
75
  ```python
76
+ import ast
77
  # Get overview statistics
78
+ summary_raw = run_get_leaderboard_summary(
79
+ repo="kshitijthakkar/smoltrace-leaderboard"
80
  )
81
+ summary_data = ast.literal_eval(summary_raw) if isinstance(summary_raw, str) else summary_raw
82
  summary = summary_data['summary']
83
 
84
  # Get top 5 performers
85
+ top_raw = run_get_top_performers(
86
+ repo="kshitijthakkar/smoltrace-leaderboard",
87
  metric="success_rate",
88
  top_n=5
89
  )
90
+ top_models_data = ast.literal_eval(top_raw) if isinstance(top_raw, str) else top_raw
91
  top_models = top_models_data['top_performers']
92
 
93
  print(f"Leaderboard Overview:")
 
129
  ---
130
  Task: "Create a synthetic dataset of 20 finance-related tasks for testing agents with stock price and ROI calculation tools"
131
 
132
+ Thought: I will use the run_generate_synthetic_dataset tool to create domain-specific test tasks. I'll specify the finance domain, provide the tool names, and request 20 tasks with balanced difficulty. MCP tools return string representations of dicts.
133
  ```python
134
+ import ast
135
+ synthetic_raw = run_generate_synthetic_dataset(
136
  domain="finance",
137
  tool_names="get_stock_price,calculate_roi,fetch_company_info",
138
  num_tasks=20,
139
  difficulty_distribution="balanced",
140
  agent_type="both"
141
  )
142
+ synthetic_result = ast.literal_eval(synthetic_raw) if isinstance(synthetic_raw, str) else synthetic_raw
143
  print(f"Generated {synthetic_result['dataset_info']['num_tasks_generated']} tasks")
144
  print(f"Batches used: {synthetic_result['dataset_info']['num_batches']}")
145
  print(f"Difficulty distribution: {synthetic_result['dataset_info']['difficulty_distribution']}")
 
171
  ---
172
  Task: "Generate 50 customer support tasks and upload them to HuggingFace as 'my-org/smoltrace-customer-support-tasks'"
173
 
174
+ Thought: I'll first generate the synthetic dataset with 50 tasks, then use run_push_dataset_to_hub to upload it to HuggingFace. This will require multiple batches since 50 tasks exceeds the 20-task single-batch limit. MCP tools return string representations, so I need to parse them first.
175
  ```python
176
  import json
177
+ import ast
178
  # Step 1: Generate synthetic dataset
179
+ synthetic_raw = run_generate_synthetic_dataset(
180
  domain="customer_support",
181
  tool_names="search_knowledge_base,create_ticket,send_email,check_order_status",
182
  num_tasks=50,
183
  difficulty_distribution="progressive",
184
  agent_type="both"
185
  )
186
+ synthetic_result = ast.literal_eval(synthetic_raw) if isinstance(synthetic_raw, str) else synthetic_raw
187
  print(f"Generated {synthetic_result['dataset_info']['num_tasks_generated']} tasks in {synthetic_result['dataset_info']['num_batches']} batches")
188
 
189
  # Step 2: Extract tasks array and convert to JSON string for push_dataset_to_hub
 
241
  - For overview questions (e.g., "how many runs", "average success rate"): Use `run_get_leaderboard_summary()` (99% token savings!)
242
  - For leaderboard analysis with AI insights: Use `run_analyze_leaderboard()`
243
  - ONLY use `run_get_dataset()` for non-leaderboard datasets (traces, results, metrics)
244
+ - **IMPORTANT - MCP Tool Returns**: MCP tools return STRING representations of Python dicts (with single quotes). ALWAYS use this pattern:
245
+ ```python
246
+ import ast
247
+ result_raw = run_tool(...)
248
+ result = ast.literal_eval(result_raw) if isinstance(result_raw, str) else result_raw
249
+ ```
250
+ Then access dict keys normally: `result['key']`. Use json.dumps() when converting dict to JSON string (e.g., for push_dataset_to_hub).
251
  5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.
252
  6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'.
253
  7. Never create any notional variables in our code, as having these in your logs will derail you from the true variables.