MCP: Programmatic Tool Calling (Code Mode) with OpenSandbox
Introduction Model Context Protocol or MCP enables AI agents to access external systems they cannot reach by default, including authenticated APIs, CI/CD pipelines, live process streams, and IDE integrations. It acts as a structured bridge between the model and real-world environments, allowing controlled interaction with tools and infrastructure. However, MCP does not automatically make interactions efficient or intelligent. Traditional MCP implementations often inject large JSON payloads into the model context, which increases token consumption and reduces efficiency. MCP also does not eliminate the need for proper tool selection and orchestration; if poorly structured, it can introduce unnecessary abstraction and overhead. In environments where agents can directly execute commands or in
Introduction
Model Context Protocol or MCP enables AI agents to access external systems they cannot reach by default, including authenticated APIs, CI/CD pipelines, live process streams, and IDE integrations. It acts as a structured bridge between the model and real-world environments, allowing controlled interaction with tools and infrastructure.
However, MCP does not automatically make interactions efficient or intelligent. Traditional MCP implementations often inject large JSON payloads into the model context, which increases token consumption and reduces efficiency. MCP also does not eliminate the need for proper tool selection and orchestration; if poorly structured, it can introduce unnecessary abstraction and overhead. In environments where agents can directly execute commands or interact with systems natively, MCP may become redundant, as it primarily serves as an access layer rather than an execution engine.
In many software products, having hundreds of APIs is normal. With a traditional MCP approach, all tool definitions are loaded into the model’s context upfront. This inflates the context window with large OpenAPI schemas (adapt to MCP tools), even when most tools are irrelevant to the current request. The result is unnecessary token consumption, higher cost, slower reasoning, and increased risk of hallucination due to context bloat. In this post, I use the OpenAPI schemas from https://www.openbrewerydb.org/documentation and https://petstore3.swagger.io, load them into application memory, adapt them into MCP tools, and dynamically search and invoke the relevant endpoints.
Statistics shared by Anthropic illustrate the scale of the problem under the current MCP pattern:
Consider a five-server setup:
-
GitHub: 35 tools (~26K tokens)
-
Slack: 11 tools (~21K tokens)
-
Sentry: 5 tools (~3K tokens)
-
Grafana: 5 tools (~3K tokens)
-
Splunk: 2 tools (~2K tokens)
That's 58 tools consuming approximately 55K tokens before the conversation even starts. Add more servers like Jira (which alone uses ~17K tokens) and you're quickly approaching 100K+ token overhead. At Anthropic, they've seen tool definitions consume 134K tokens before optimization.
This means that before any meaningful task begins, thousands of tokens may already be consumed simply to describe available OpenAPI tools in many enterprise systems.
To address this inefficiency, Anthropic introduced two optimization techniques: Tool Search Tool and Programmatic Tool Calling (Code Mode). These approaches avoid preloading every tool definition and instead retrieve or execute tools only when required, significantly reducing token overhead.
I have been researching and implementing these techniques in .NET/C# for enterprise use. But this article focuses specifically on applying Code Mode with both a local sandbox and OpenSandbox, exploring how executable tool invocation can replace large static tool context injection.
Code Mode with OpenSandbox and Local Sandbox runners
At a high level, the flow operates as follows. At application startup, all available OpenAPI specifications are discovered and loaded into a tool registry. When a request arrives, the system performs a search over the registered OpenAPI endpoint metadata (treated as tools). The LLM then inspects the selected tool’s schema via get_schema, enabling it to understand the available operations, parameters, and data structures.
Using this schema, the LLM generates Python code that correctly invokes the chosen endpoint. The generated code is sent to a sandbox environment—either a local sandbox or OpenSandbox—through execute. The sandbox executes the code, which issues outbound HTTP requests to the target system via a request client.
After execution, the sandbox returns the raw result to the host. The LLM analyzes this output and converts it into a structured, human-readable response for the end user.
A sandbox is required because the Python script is generated dynamically by the LLM. Executing model-generated code directly in the host environment introduces security risks, including arbitrary file access, network misuse, privilege escalation, and system compromise. A sandbox isolates execution, enforces resource limits, controls outbound network access, and restricts filesystem and process permissions. This containment ensures that generated code can perform necessary API calls without exposing the host system to uncontrolled behavior.
If you want to explore OpenSandbox, refer to the official repository by Alibaba: OpenSandbox
In short:
OpenSandbox is a general-purpose sandbox platform for AI applications, offering multi-language SDKs, unified sandbox APIs, and Docker/Kubernetes runtimes for scenarios like Coding Agents, GUI Agents, Agent Evaluation, AI Code Execution, and RL Training.
OpenSandbox is now listed in the CNCF Landscape.
Walk through Code Mode flow
Access to this >> Mermaid, if you want to see the flow in detail
In the core, there are 3 main tools: search, get_schema, and execute.
`[McpServerTool(Name = "search"), Description("Discover tools by query with optional detail, tag filter, and limit. Use before get_schema and execute in a multi-step code-mode workflow.")] public static DiscoverySearchResponse Search( [FromServices] DiscoveryTools discoveryTools, [FromServices] UserContext context, [FromServices] ILoggerFactory loggerFactory, [Description("Search query for tool discovery")] string query, [Description("Detail level: Brief (default), Detailed, or Full. Also accepts 0, 1, or 2.")] JsonElement ? detail = null, [Description("Optional tag filters. When omitted, all tags are included.")] IReadOnlyList < string > ? tags = null, [Description("Optional maximum results. Defaults to server discovery default.")] int ? limit = null) { SchemaDetailLevel resolvedDetail = detail.HasValue ? ParseSchemaDetailLevel(detail.Value, SchemaDetailLevel.Brief) : SchemaDetailLevel.Brief; DiscoverySearchResponse response = discoveryTools.Search(query, context, resolvedDetail, tags, limit);
return response; }`
Enter fullscreen mode
Exit fullscreen mode
`[McpServerTool(Name = "get_schema"), Description("Retrieve input schemas for tool names. Pass a list via toolNames or a single name via name. Default detail is Detailed markdown, and missing tool names are reported.")] public static SchemaLookupResponse GetSchema( [FromServices] DiscoveryTools discoveryTools, [FromServices] UserContext context, [FromServices] ILoggerFactory loggerFactory, [Description("List of tool names to retrieve schemas for")] IReadOnlyList < string > ? toolNames = null, [Description("Single tool name shorthand - equivalent to toolNames with one entry")] string ? name = null, [Description( "Schema verbosity: Brief name-only, Detailed compact markdown, Full full JSON schema. Also accepts 0, 1, or 2.")] JsonElement ? detail = null) { IReadOnlyList < string > requestedToolNames = ResolveSchemaToolNames(toolNames, name); SchemaDetailLevel resolvedDetail = detail.HasValue ? ParseSchemaDetailLevel(detail.Value, SchemaDetailLevel.Detailed) : SchemaDetailLevel.Detailed; SchemaLookupResponse response = discoveryTools.GetSchema(requestedToolNames, context, resolvedDetail);
return response; }`
Enter fullscreen mode
Exit fullscreen mode
[McpServerTool(Name = "execute"), Description("Execute constrained code and return the final result. IMPORTANT: call search, then get_schema, then get_execute_syntax before Execute.The runner will reject code written in the wrong style with an error message.")] public static async Task < object ? > Execute( [Description("Code string written in the syntax returned by get_execute_syntax.")] string code, [FromServices] ExecuteTool executeTool, [FromServices] ILoggerFactory loggerFactory, CancellationToken ct) { try { ExecuteResponse response = await executeTool.ExecuteAsync(code, ct); return response.FinalValue; } catch (Exception ex) when(ex is not OperationCanceledException) { logger.LogWarning(ex, "[codemode] {HandlerName} failed: {Error}.", nameof(Execute), ex.Message); return $"[Execute error] {ex.Message}"; } }[McpServerTool(Name = "execute"), Description("Execute constrained code and return the final result. IMPORTANT: call search, then get_schema, then get_execute_syntax before Execute.The runner will reject code written in the wrong style with an error message.")] public static async Task < object ? > Execute( [Description("Code string written in the syntax returned by get_execute_syntax.")] string code, [FromServices] ExecuteTool executeTool, [FromServices] ILoggerFactory loggerFactory, CancellationToken ct) { try { ExecuteResponse response = await executeTool.ExecuteAsync(code, ct); return response.FinalValue; } catch (Exception ex) when(ex is not OperationCanceledException) { logger.LogWarning(ex, "[codemode] {HandlerName} failed: {Error}.", nameof(Execute), ex.Message); return $"[Execute error] {ex.Message}"; } }Enter fullscreen mode
Exit fullscreen mode
And the sandbox interface (which will support 2 approaches: local sandbox and OpenSandBox) as following
`public interface ISandboxRunner { string SyntaxGuide { get; }
Task RunAsync(string code, CancellationToken ct); }`
Enter fullscreen mode
Exit fullscreen mode
And with local sandbox implementation:
`public sealed class LocalConstrainedRunner : ISandboxRunner { private static readonly Regex AbsoluteHttpUrlRegex = new( "https?://[^\s"'\)\]]+", RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.CultureInvariant);
private readonly TimeSpan timeout; private readonly int maxToolCalls;
// Single source of truth for the URL allowlist. String representations are derived on use. private readonly IReadOnlyList allowedBaseUris;
private static readonly JsonSerializerOptions PayloadJsonOptions = new() { PropertyNameCaseInsensitive = true, };
public LocalConstrainedRunner( TimeSpan timeout, int maxToolCalls, IReadOnlyList? allowedBaseUrls = null) { this.timeout = timeout; this.maxToolCalls = maxToolCalls; allowedBaseUris = NormalizeAllowedBaseUris(allowedBaseUrls); }
public string SyntaxGuide { get { string defaultBaseUrl = allowedBaseUris.Count > 0 ? allowedBaseUris[0].AbsoluteUri.TrimEnd('/') : string.Empty;
string baseUrlSection = defaultBaseUrl.Length > 0
? $"Use the injected BASE_URL variable for API calls. Default BASE_URL is "{defaultBaseUrl}"."
: "Use the injected BASE_URL variable for API calls. No default URL was discovered from OpenAPI sources.";
string allowedSection = allowedBaseUris.Count > 0
? $"Only call API URLs under configured OpenAPI bases via BASE_URL: {string.Join(", ", allowedBaseUris.Select(static u => $""{u.AbsoluteUri.TrimEnd('/')}""))}."
: "No API base allowlist is currently configured, so URL host restrictions are not enforced.";
return $$"""
Runner: local (Python)
Write pure Python code.
You may call HTTP APIs directly from Python code when needed.
A lightweight requests-compatible shim is available for basic HTTP requests; set a timeout (for example: timeout=10).
Prefer assigning the final value to result.
If result is not set, captured stdout is returned when available.
{{baseUrlSection}}
{{allowedSection}}
Code mode is isolated from tool-Search tools.
Do NOT use: SearchTools, CallTool, Search, GetSchema, or Execute in this code.
Example:
import requests
response = requests.get(f"{BASE_URL}/pet/findByStatus", params={"status": "sold"}, timeout=10)
response.raise_for_status()
result = response.json()
The final result value (or stdout fallback) is returned as tool output.
""";
}
}
public async Task RunAsync(string code, CancellationToken ct) { ArgumentException.ThrowIfNullOrWhiteSpace(code);
// remove for the brevity // ...
using CancellationTokenSource timeoutCts = new(timeout); using CancellationTokenSource linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token);
object? finalValue = await ExecutePythonLocallyAsync(code, linkedCts.Token); return new RunnerResult(finalValue, 0); }
private bool ContainsForbiddenHardcodedApiUsage(string code) { // remove for the brevity // ... }
private async Task ExecutePythonLocallyAsync(string code, CancellationToken ct) { string encodedCode = Convert.ToBase64String(Encoding.UTF8.GetBytes(code)); string serializedDefaultBaseUrl = JsonSerializer.Serialize( allowedBaseUris.Count > 0 ? allowedBaseUris[0].AbsoluteUri.TrimEnd('/') : string.Empty); string wrapper = $$""" import base64 import contextlib import io import json import sys import traceback import types import urllib.error import urllib.parse import urllib.request
code = base64.b64decode("{{encodedCode}}".encode("ascii")).decode("utf-8") BASE_URL = {{serializedDefaultBaseUrl}}
Install a default opener with explicit headers. Some public APIs reject
Python's default urllib user-agent and return HTTP 403.
_opener = urllib.request.build_opener() _opener.addheaders = [ ("User-Agent", "mcp-experiments-local-runner/1.0"), ("Accept", "application/json"), ] urllib.request.install_opener(_opener)
class _RequestsError(Exception): pass
class _RequestsHttpError(_RequestsError): pass
class _RequestsResponse: def init(self, status_code, content, headers): self.status_code = status_code self.content = content self.headers = dict(headers.items()) if headers is not None else {} self.text = content.decode("utf-8", errors="replace")
def json(self): if not self.text: return None return json.loads(self.text)
def raise_for_status(self): if self.status_code >= 400: raise _RequestsHttpError(f"HTTP {self.status_code}")
def _append_query(url, params): if not params: return url
query = urllib.parse.urlencode(params, doseq=True) separator = "&" if "?" in url else "?" return f"{url}{separator}{query}"
def _normalize_body(data, json_payload, headers): if json_payload is not None: headers.setdefault("Content-Type", "application/json") return json.dumps(json_payload).encode("utf-8")
if isinstance(data, str): return data.encode("utf-8")
return data
def _requests_request(method, url, params=None, data=None, json=None, headers=None, timeout=None): request_headers = dict(headers or {}) request_url = _append_query(url, params) request_body = _normalize_body(data, json, request_headers) request = urllib.request.Request(request_url, data=request_body, headers=request_headers, method=method.upper())
try: with urllib.request.urlopen(request, timeout=timeout) as response: return _RequestsResponse(response.getcode(), response.read(), response.headers) except urllib.error.HTTPError as ex: return _RequestsResponse(ex.code, ex.read(), ex.headers)
requests = types.ModuleType("requests") requests.RequestException = _RequestsError requests.HTTPError = _RequestsHttpError requests.request = _requests_request requests.get = lambda url, **kwargs: _requests_request("GET", url, **kwargs) requests.post = lambda url, **kwargs: _requests_request("POST", url, **kwargs) requests.put = lambda url, **kwargs: _requests_request("PUT", url, **kwargs) requests.delete = lambda url, **kwargs: _requests_request("DELETE", url, **kwargs) requests.patch = lambda url, **kwargs: _requests_request("PATCH", url, **kwargs) sys.modules["requests"] = requests
scope = { "BASE_URL": BASE_URL } captured_stdout = io.StringIO() captured_stderr = io.StringIO() try: with contextlib.redirect_stdout(captured_stdout), contextlib.redirect_stderr(captured_stderr): exec(code, scope, scope) payload = { "ok": True, "finalValue": scope.get("result"), "stdout": captured_stdout.getvalue(), "stderr": captured_stderr.getvalue() } except Exception as ex: payload = { "ok": False, "error": str(ex), "traceback": traceback.format_exc(), "stdout": captured_stdout.getvalue(), "stderr": captured_stderr.getvalue() }
print(json.dumps(payload, ensure_ascii=False, default=str)) """;
ProcessStartInfo startInfo = new() { FileName = "python3", Arguments = "-", RedirectStandardInput = true, RedirectStandardOutput = true, RedirectStandardError = true, UseShellExecute = false, CreateNoWindow = true, };
using Process process = new() { StartInfo = startInfo }; if (!process.Start()) { throw new InvalidOperationException("Failed to start local Python process."); }
await process.StandardInput.WriteAsync(wrapper.AsMemory(), ct); await process.StandardInput.FlushAsync(); process.StandardInput.Close();
Task stdoutTask = process.StandardOutput.ReadToEndAsync(ct); Task stderrTask = process.StandardError.ReadToEndAsync(ct); await process.WaitForExitAsync(ct);
string stdout = await stdoutTask; string stderr = await stderrTask;
if (process.ExitCode != 0) { throw new InvalidOperationException($"Local Python execution failed with exit code {process.ExitCode}: {stderr}"); }
string? payloadLine = stdout .Split('\n', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries) .LastOrDefault();
if (string.IsNullOrWhiteSpace(payloadLine)) { throw new InvalidOperationException("Local Python execution produced no parseable output."); }
LocalExecutionPayload? payload = JsonSerializer.Deserialize(payloadLine, PayloadJsonOptions); if (payload is null) { throw new InvalidOperationException("Local Python execution payload is empty."); }
if (!payload.Ok) { // remove for the brevity // ... }
object? finalValue = payload.FinalValue; if (finalValue is null && !string.IsNullOrWhiteSpace(payload.Stdout)) { finalValue = payload.Stdout.TrimEnd(); }
return finalValue; }
// remove for the brevity // ... }`
Enter fullscreen mode
Exit fullscreen mode
And the OpenSandbox implementation:
`public sealed class OpenSandboxRunner : ISandboxRunner { private static readonly JsonSerializerOptions PayloadJsonOptions = new() { PropertyNameCaseInsensitive = true, };
private readonly OpenSandboxRunnerOptions options; private readonly Func> remoteExecutor;
public string SyntaxGuide =>
"""
Runner: OpenSandbox (Python)
Write pure Python code.
You may call HTTP APIs directly from Python code when needed.
A lightweight requests-compatible shim is available for basic HTTP requests; set a timeout (for example: timeout=10).
Assign the final value to a variable named exactly result (lowercase).
If result is not set, captured stdout is returned when available.
CRITICAL: Do NOT use Result, RESULT, or any other casing — only lowercase result is captured.
Do NOT use bare identifiers as section dividers (e.g., Result # comment). Use only # comments.
Code mode is fully isolated from tool-Search tools.
Do NOT use: SearchTools, CallTool, Search, GetSchema, or Execute in this code.
HTTP example:
import requests
response = requests.get("https://api.example.com/data", timeout=10)
result = response.json()
Compute example:
data = [1, 2, 3]
result = sum(data)
The final result value (or stdout fallback) is returned as tool output.
""";
public OpenSandboxRunner( OpenSandboxRunnerOptions options, Func>? remoteExecutor = null) { ArgumentNullException.ThrowIfNull(options); ArgumentNullException.ThrowIfNull(loggerFactory);
if (string.IsNullOrWhiteSpace(options.Domain)) { throw new InvalidOperationException("OpenSandbox:Domain is required when CodeMode:Runner=opensandbox."); }
this.options = options; this.remoteExecutor = remoteExecutor ?? ((code, ct) => ExecuteInSandboxAsync(this.options, this.logger, code, ct)); }
public async Task RunAsync(string code, CancellationToken ct) { // Code mode must stay isolated from tool-Search meta-tools. if (SandboxCodeGuard.ContainsForbiddenMetaToolUsage(code)) { throw new InvalidOperationException( "Code mode is isolated from tool-Search tools. " + "Do not call SearchTools, CallTool, Search, GetSchema, or Execute inside code mode; use pure Python compute only."); }
try { using CancellationTokenSource timeoutCts = new(options.Timeout); using CancellationTokenSource linkedCts = CancellationTokenSource.CreateLinkedTokenSource(ct, timeoutCts.Token);
RunnerResult remoteResult = await remoteExecutor(code, linkedCts.Token); return remoteResult; } catch (OperationCanceledException ex) when (!ct.IsCancellationRequested) { // remove for the brevity } catch (Exception ex) when (!ct.IsCancellationRequested) { // remove for the brevity } }
private static async Task ExecuteInSandboxAsync( OpenSandboxRunnerOptions options, ILogger logger, string code, CancellationToken ct) { var config = ConnectionConfigBuilder.Build(options); string encodedCode = Convert.ToBase64String(Encoding.UTF8.GetBytes(code)); string command = BuildPythonCommand(encodedCode);
// Suppress HTTP auto-instrumentation for SDK internal polling and RPC chatter. using var suppressScope = SuppressInstrumentationScope.Begin();
// Retry sandbox creation with exponential backoff await using Sandbox sandbox = await RetryHelper.RetryAsync( async () => await Sandbox.CreateAsync(new SandboxCreateOptions { ConnectionConfig = config, Image = options.Image, TimeoutSeconds = options.TimeoutSeconds, ReadyTimeoutSeconds = options.ReadyTimeoutSeconds, }).WaitAsync(ct), maxAttempts: 3, initialDelay: TimeSpan.FromSeconds(0.5), cancellationToken: ct);
Execution execution; try { // Retry sandbox command execution execution = await RetryHelper.RetryAsync( async () => await sandbox.Commands.RunAsync(command, cancellationToken: ct).WaitAsync(ct), maxAttempts: 2, initialDelay: TimeSpan.FromSeconds(0.5), cancellationToken: ct); } finally { try { await sandbox.KillAsync(); logger.LogInformation("OpenSandbox instance killed."); } catch (Exception killEx) { logger.LogWarning(killEx, "Failed to kill OpenSandbox instance cleanly."); } }
if (execution.Error is not null) { throw new InvalidOperationException($"OpenSandbox execution failed: {execution.Error.Name}: {execution.Error.Value}"); }
string stdout = string.Join( '\n', execution.Logs.Stdout.Select(message => message.Text).Where(text => !string.IsNullOrWhiteSpace(text)));
string? payloadLine = stdout .Split('\n', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries) .LastOrDefault();
if (string.IsNullOrWhiteSpace(payloadLine)) { throw new InvalidOperationException("OpenSandbox execution produced no parseable output."); }
SandboxExecutionPayload? payload = JsonSerializer.Deserialize(payloadLine, PayloadJsonOptions); if (payload is null) { throw new InvalidOperationException("OpenSandbox execution payload is empty."); }
if (!payload.Ok) { throw new InvalidOperationException( $"OpenSandbox script error: {payload.Error ?? "Unknown error"}. {payload.Traceback} {payload.Stderr}"); }
object? finalValue = payload.FinalValue; if (finalValue is null && !string.IsNullOrWhiteSpace(payload.Stdout)) { finalValue = payload.Stdout.TrimEnd(); }
return new RunnerResult(finalValue, 0); }
private static string BuildPythonCommand(string encodedCode) { return $$""" python3 - <<'PY' import base64 import contextlib import io import json import sys import traceback import types import urllib.error import urllib.parse import urllib.request
code = base64.b64decode("{{encodedCode}}".encode("ascii")).decode("utf-8")
Install a default opener with explicit headers. Some public APIs reject
Python's default urllib user-agent and return HTTP 403.
_opener = urllib.request.build_opener() _opener.addheaders = [ ("User-Agent", "mcp-experiments/1.0"), ("Accept", "application/json"), ] urllib.request.install_opener(_opener)
class _RequestsError(Exception): pass
class _RequestsHttpError(_RequestsError): pass
class _RequestsResponse: def init(self, status_code, content, headers): self.status_code = status_code self.content = content self.headers = dict(headers.items()) if headers is not None else {} self.text = content.decode("utf-8", errors="replace")
def json(self): if not self.text: return None return json.loads(self.text)
def raise_for_status(self): if self.status_code >= 400: raise _RequestsHttpError(f"HTTP {self.status_code}")
def _append_query(url, params): if not params: return url
query = urllib.parse.urlencode(params, doseq=True) separator = "&" if "?" in url else "?" return f"{url}{separator}{query}"
def _normalize_body(data, json_payload, headers): if json_payload is not None: headers.setdefault("Content-Type", "application/json") return json.dumps(json_payload).encode("utf-8")
if isinstance(data, str): return data.encode("utf-8")
return data
def _requests_request(method, url, params=None, data=None, json=None, headers=None, timeout=None): request_headers = dict(headers or {}) request_url = _append_query(url, params) request_body = _normalize_body(data, json, request_headers) request = urllib.request.Request(request_url, data=request_body, headers=request_headers, method=method.upper())
try: with urllib.request.urlopen(request, timeout=timeout) as response: return _RequestsResponse(response.getcode(), response.read(), response.headers) except urllib.error.HTTPError as ex: return _RequestsResponse(ex.code, ex.read(), ex.headers)
requests = types.ModuleType("requests") requests.RequestException = _RequestsError requests.HTTPError = _RequestsHttpError requests.request = _requests_request requests.get = lambda url, **kwargs: _requests_request("GET", url, **kwargs) requests.post = lambda url, **kwargs: _requests_request("POST", url, **kwargs) requests.put = lambda url, **kwargs: _requests_request("PUT", url, **kwargs) requests.delete = lambda url, **kwargs: _requests_request("DELETE", url, **kwargs) requests.patch = lambda url, **kwargs: _requests_request("PATCH", url, **kwargs) sys.modules["requests"] = requests
scope = {} captured_stdout = io.StringIO() captured_stderr = io.StringIO() try: with contextlib.redirect_stdout(captured_stdout), contextlib.redirect_stderr(captured_stderr): exec(code, scope, scope) payload = { "ok": True, "finalValue": scope.get("result"), "stdout": captured_stdout.getvalue(), "stderr": captured_stderr.getvalue() } except BaseException as ex: payload = { "ok": False, "error": str(ex), "traceback": traceback.format_exc(), "stdout": captured_stdout.getvalue(), "stderr": captured_stderr.getvalue() }
try: print(json.dumps(payload, ensure_ascii=False, default=str)) except Exception as serialization_ex: fallback_payload = { "ok": False, "error": f"Failed to serialize execution payload: {serialization_ex}", "traceback": traceback.format_exc(), "stdout": captured_stdout.getvalue(), "stderr": captured_stderr.getvalue() } print(json.dumps(fallback_payload, ensure_ascii=False, default=str)) PY """; } }`
Enter fullscreen mode
Exit fullscreen mode
Test it all
The steps below shall run OpenSandbox runner by Alibaba.
At the root project run:
dotnet run --project src/AppHost/AppHost.csproj
Enter fullscreen mode
Exit fullscreen mode
Open the testweb project:
Now, let's input Use code mode to find breweries with moon in the name, return the top 5 as name and city
Back to Aspire dashboard logs, we should see some important logs:
`... info: McpServer.CodeMode.CodeModeHandlers[0] [codemode] Search returned 10 tools out of 27 matches for query brewery search by name: get_user_by_name, brewery_search, search_breweries, get_single_brewery, brewery_get, brewery_list, find_pets_by_tags, get_random_brewery, brewery_random, find_pets_by_status. ... info: McpServer.CodeMode.CodeModeHandlers[0] [codemode] GetSchema returned 1 schemas for tools search_breweries. Detail: Detailed. Missing: (none). ... info: McpServer.CodeMode.CodeModeHandlers[0] [codemode] Execute executing code (253 chars). ... info: McpServer.CodeMode.OpenSandbox.OpenSandboxRunner[0] Generated Python code for OpenSandbox Execute:
import requests
response = requests.get( "https://api.openbrewerydb.org/v1/breweries/search", params={"query": "moon", "per_page": 5}, timeout=10 ) data = response.json() result = [{"name": b["name"], "city": b["city"]} for b in data[:5]]
info: McpServer.CodeMode.OpenSandbox.OpenSandboxRunner[0] Sending generated Python code to OpenSandbox server:
import requests
response = requests.get( "https://api.openbrewerydb.org/v1/breweries/search", params={"query": "moon", "per_page": 5}, timeout=10 ) data = response.json() result = [{"name": b["name"], "city": b["city"]} for b in data[:5]]
info: McpServer.CodeMode.OpenSandbox.OpenSandboxRunner[0] Creating OpenSandbox instance for domain localhost:8080 with image python:3.12-slim. ...
OpenSandbox info: 2026-04-03 10:57:15+0000 [7af6d2f6e67b46a3b4ad911d949ea9fa] src.services.docker: sandbox=50f59a2e-24c4-4eb0-9356-515359352d0f | action=inspect image python:3.12-slim | duration=29.24 OpenSandbox info: 2026-04-03 10:57:16+0000 [7af6d2f6e67b46a3b4ad911d949ea9fa] src.services.docker: sandbox=50f59a2e-24c4-4eb0-9356-515359352d0f | action=create sandbox container | duration=108.47 OpenSandbox info: 2026-04-03 10:57:16+0000 [7af6d2f6e67b46a3b4ad911d949ea9fa] src.services.docker: sandbox=50f59a2e-24c4-4eb0-9356-515359352d0f | action=ensure directory /opt/opensandbox | duration=21.35 OpenSandbox info: 2026-04-03 10:57:16+0000 [7af6d2f6e67b46a3b4ad911d949ea9fa] src.services.docker: sandbox=50f59a2e-24c4-4eb0-9356-515359352d0f | action=copy execd archive to sandbox | duration=246.19 OpenSandbox info: 2026-04-03 10:57:16+0000 [7af6d2f6e67b46a3b4ad911d949ea9fa] src.services.docker: sandbox=50f59a2e-24c4-4eb0-9356-515359352d0f | action=ensure directory /opt/opensandbox | duration=19.67 OpenSandbox info: 2026-04-03 10:57:16+0000 [7af6d2f6e67b46a3b4ad911d949ea9fa] src.services.docker: sandbox=50f59a2e-24c4-4eb0-9356-515359352d0f | action=install bootstrap script | duration=11.49 OpenSandbox info: 2026-04-03 10:57:16+0000 [7af6d2f6e67b46a3b4ad911d949ea9fa] src.services.docker: sandbox=50f59a2e-24c4-4eb0-9356-515359352d0f | action=start sandbox container | duration=486.26 OpenSandbox info: 2026-04-03 10:57:16+0000 [-] uvicorn.access: 192.168.117.1:47168 - "POST /v1/sandboxes HTTP/1.1" 202 OpenSandbox info: 2026-04-03 10:57:16+0000 [-] uvicorn.access: 192.168.117.1:47168 - "GET /v1/sandboxes/50f59a2e-24c4-4eb0-9356-515359352d0f/endpoints/44772?use_server_proxy=true HTTP/1.1" 200 OpenSandbox info: 2026-04-03 10:57:16+0000 [-] uvicorn.access: 192.168.117.1:47168 - "GET /v1/sandboxes/50f59a2e-24c4-4eb0-9356-515359352d0f/endpoints/18080?use_server_proxy=true HTTP/1.1" 200 ... info: McpServer.CodeMode.OpenSandbox.OpenSandboxRunner[0] OpenSandbox instance is ready. Running generated Python command. ... OpenSandbox info: 2026-04-03 10:57:16+0000 [-] uvicorn.access: 192.168.117.1:47168 - "GET /sandboxes/50f59a2e-24c4-4eb0-9356-515359352d0f/proxy/44772/ping HTTP/1.1" 200 OpenSandbox info: 2026-04-03 10:57:16+0000 [-] uvicorn.access: 192.168.117.1:42344 - "POST /sandboxes/50f59a2e-24c4-4eb0-9356-515359352d0f/proxy/44772/command HTTP/1.1" 200 ... info: McpServer.CodeMode.OpenSandbox.OpenSandboxRunner[0] OpenSandbox command execution completed. ... OpenSandbox info: 2026-04-03 10:57:20+0000 [c4cc5cb1084747e9aebddae865d3bf33] src.services.docker: sandbox=50f59a2e-24c4-4eb0-9356-515359352d0f | action=kill sandbox container | duration=130.02 OpenSandbox info: 2026-04-03 10:57:20+0000 [c4cc5cb1084747e9aebddae865d3bf33] src.services.docker: sandbox=50f59a2e-24c4-4eb0-9356-515359352d0f | action=remove sandbox container | duration=22.65 OpenSandbox info: 2026-04-03 10:57:20+0000 [-] uvicorn.access: 192.168.117.1:47168 - "DELETE /v1/sandboxes/50f59a2e-24c4-4eb0-9356-515359352d0f HTTP/1.1" 204 ... info: McpServer.CodeMode.OpenSandbox.OpenSandboxRunner[0] OpenSandbox instance killed. info: McpServer.CodeMode.OpenSandbox.OpenSandboxRunner[0] OpenSandbox execution raw output: {"ok": true, "finalValue": [{"name": "Moon Under Water Brewery", "city": "Victoria"}, {"name": "Moon Dog Craft Brewery", "city": "Abbotsford"}, {"name": "Moon Tower Sudworks", "city": "Houston"}, {"name": "Moon Hill Brewing Co., Inc.", "city": "Gardner"}, {"name": "Moon River Brewing Co", "city": "Savannah"}], "stdout": "", "stderr": ""} info: McpServer.CodeMode.OpenSandbox.OpenSandboxRunner[0]`
Enter fullscreen mode
Exit fullscreen mode
You can see LLM model generates the Python code and send to OpenSandbox for execution:
`""" with the prompt: Use code mode to find breweries with moon in the name, return the top 5 as name and city The code generated by LLM model as: """ import requests
response = requests.get( "https://api.openbrewerydb.org/v1/breweries/search", params={"query": "moon", "per_page": 5}, timeout=10)
data = response.json() result = [{"name": b["name"], "city": b["city"]} for b in data[:5]]`
Enter fullscreen mode
Exit fullscreen mode
Then on the UI:
And even more complex query like:
`Write Python code for execute mode.
Requirements:
- Get breweries in San Diego.
- Return the total number of breweries found for San Diego.
- Search for breweries matching "moon" and return top 5 with only:
- name
- city
- Also compute a San Diego summary:
- count by brewery_type
- top 3 cities by number of breweries (if city is missing, use "Unknown")
- Deduplicate breweries by id before counting.
- Sort outputs deterministically:
- type summary: count desc, then type asc
- top cities: count desc, then city asc
- moon top 5: name asc
- Return one JSON object in
resultwith this exact shape: { "san_diego_total": number, "san_diego_type_summary": [{"brewery_type": string, "count": number}], "san_diego_top_cities": [{"city": string, "count": number}], "moon_top_5": [{"name": string, "city": string}] } - If any tool call fails, set a top-level "errors" array with readable messages but still return partial data.`
Enter fullscreen mode
Exit fullscreen mode
`""" The code generated by LLM model as: """ import requests from collections import defaultdict
errors = [] BASE = "https://api.openbrewerydb.org/v1/breweries"
--- 1. Fetch all San Diego breweries (paginate) ---
sd_breweries = {} page = 1 while True: try: r = requests.get(BASE, params={"by_city": "san_diego", "per_page": 200, "page": page}, timeout=10) r.raise_for_status() page_data = r.json() except Exception as e: errors.append(f"San Diego fetch page {page} failed: {e}") break if not page_data: break for b in page_data: bid = b.get("id") if bid and bid not in sd_breweries: sd_breweries[bid] = b if len(page_data) < 200: break page += 1
sd_list = list(sd_breweries.values()) san_diego_total = len(sd_list)
--- 4a. Count by brewery_type ---
type_counts = defaultdict(int) for b in sd_list: type_counts[b.get("brewery_type") or "unknown"] += 1
san_diego_type_summary = sorted( [{"brewery_type": t, "count": c} for t, c in type_counts.items()], key=lambda x: (-x["count"], x["brewery_type"]) )
--- 4b. Top 3 cities ---
city_counts = defaultdict(int) for b in sd_list: city = b.get("city") or "Unknown" city_counts[city] += 1
all_cities_sorted = sorted( [{"city": c, "count": n} for c, n in city_counts.items()], key=lambda x: (-x["count"], x["city"]) ) san_diego_top_cities = all_cities_sorted[:3]
--- 3. Search for "moon" breweries ---
moon_top_5 = [] try: r = requests.get(f"{BASE}/search", params={"query": "moon", "per_page": 50}, timeout=10) r.raise_for_status() moon_data = r.json() moon_deduped = {} for b in moon_data: bid = b.get("id") if bid and bid not in moon_deduped: moon_deduped[bid] = b moon_sorted = sorted(moon_deduped.values(), key=lambda x: x.get("name") or "") moon_top_5 = [{"name": b["name"], "city": b.get("city") or ""} for b in moon_sorted[:5]] except Exception as e: errors.append(f"Moon search failed: {e}")
result = { "san_diego_total": san_diego_total, "san_diego_type_summary": san_diego_type_summary, "san_diego_top_cities": san_diego_top_cities, "moon_top_5": moon_top_5, } if errors: result["errors"] = errors`
Enter fullscreen mode
Exit fullscreen mode
And the response UI:
The code executed in the local sandbox should mirror the same behavior as in OpenSandbox to ensure consistency during development and testing. However, the local sandbox setup is strictly for development purposes. It must not be used in a production environment.
Conclusion
Programmatic Tool Calling (Code Mode) with OpenSandbox significantly reduces token usage by avoiding large upfront tool schema injection. Instead of loading every API definition into the context window, the LLM generates a Python script that executes inside a sandbox, performs the required API calls, and returns only the relevant result. This approach is especially effective for complex queries and multi-step operations that are difficult to express through traditional MCP-style tool invocation.
However, this model introduces new risks. Prompt injection becomes more dangerous when the model is allowed to generate executable code. The generated script may attempt to access sensitive files, exfiltrate secrets, or communicate with unauthorized external hosts. Without strict controls, execution can become unpredictable. For this reason, code must run inside a sandbox with explicit network whitelisting, filesystem restrictions, resource limits, and controlled environment variables. Execution isolation is mandatory, not optional.
I will continue exploring this approach in the coming weeks. I will present a deeper technical breakdown of Tool Search Tool and Code Mode at AI Tinkerers in Ho Chi Minh City on April 18, 2026.
Disclaimer:
-
The referenced codebase was generated primarily using GitHub Copilot in Visual Studio Code. Minimal manual coding was required, but debugging and architectural correction were necessary (human). Early generations contained confusing control flows and suboptimal structure. Manual debugging and iterative steering were required to align the system with the intended architecture (human).
-
Letting AI generate and execute code autonomously often leads to architectural drift, lack of optimization, or incorrect abstractions. Explicit guardrails are required (human).
-
Embedding principles such as KISS, YAGNI, DRY, Boy Scout Rule, and TDD (red–green cycle) into copilot-instructions.md materially improves output quality.
-
AI-native development enables building ideas that were previously constrained by time or knowledge. However, overreliance without verification can distort judgment and engineering rigor. The responsibility for correctness, architecture, and security remains with the engineer (human).
At times, though, it leaves me feeling what some describe as "AI psychosis." (https://www.youtube.com/watch?v=kwSVtQ7dziU)
All source code can be found at https://github.com/thangchung/agent-engineering-experiment/tree/main/mcp-experiments
References:
DEV Community
https://dev.to/thangchung/mcp-programmatic-tool-calling-code-mode-with-opensandbox-4n3nSign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modeltrainingavailable
Insight Health raises $11M to scale clinical AI agents
Insight Health, which offers a clinical agent platform, has secured $11 million in Series A funding. The round was led by Standard Capital, with participation from Kindred Ventures, Pear VC, Eudemian, 43 and ElevenLabs. WHAT IT DOES The New York-based company offers a patient-facing virtual care AI assistant, Lumi, as well as AI agents for clinical workflows for clinical teams.
Desktop Canary v2.1.48-canary.30
🐤 Canary Build — v2.1.48-canary.30 Automated canary build from canary branch. Commit Information Based on changes since v2.1.48-canary.29 Commit count: 2 40d0825d79 🐛 fix(agent,topic): should reset agent side panel if agent state changes ( #13556 ) (Neko) ea725aca9e ✅ test(agentDocuments): incorrect assertion against agent document ( #13552 ) (Neko) ⚠️ Important Notes This is an automated canary build and is NOT intended for production use. Canary builds are triggered by build / fix / style commits on the canary branch. May contain unstable or incomplete changes . Use at your own risk. It is strongly recommended to back up your data before using a canary build. 📦 Installation Download the appropriate installer for your platform from the assets below. Platform File macOS (Apple Silicon)
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
Desktop Canary v2.1.48-canary.30
🐤 Canary Build — v2.1.48-canary.30 Automated canary build from canary branch. Commit Information Based on changes since v2.1.48-canary.29 Commit count: 2 40d0825d79 🐛 fix(agent,topic): should reset agent side panel if agent state changes ( #13556 ) (Neko) ea725aca9e ✅ test(agentDocuments): incorrect assertion against agent document ( #13552 ) (Neko) ⚠️ Important Notes This is an automated canary build and is NOT intended for production use. Canary builds are triggered by build / fix / style commits on the canary branch. May contain unstable or incomplete changes . Use at your own risk. It is strongly recommended to back up your data before using a canary build. 📦 Installation Download the appropriate installer for your platform from the assets below. Platform File macOS (Apple Silicon)

Show HN: EU Compliance SaaS for Sale ($4K Each) – CBAM, AI Act, Public Tenders
I built 4 SaaS products targeting mandatory EU regulations. Each for $4,000. 1. CBAM OS (cbam-os.com) - EU Carbon Border Tax compliance for importers 2. AIA Proof (aiaproof.com) - AI Act compliance & AI detection 3. AO France (ao-france.fr) - AI-powered public tender responses (FR) 4. AO Copilot (ao-copilot.fr) - BTP tender analysis, 20 modules, 1900+ tests All mandatory regulations = guaranteed market. Modern stack (Next.js, Supabase). Production-ready. Contact: [email protected] Comments URL: https://news.ycombinator.com/item?id=47632626 Points: 1 # Comments: 0

AES Maximo robot installs 100 megawatts of solar capacity
Maximo, a robotics startup incubated by energy company AES, has successfully deployed its solar panel installation robots at a site in California. The post AES Maximo robot installs 100 megawatts of solar capacity appeared first on The Robot Report .


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!