Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010
Although LLMs can generate tools for generic domains and tasks, they struggle with enterprise-related domains that involve proprietary APIs and data schemas. We present ToolSmith, a framework for autonomously generating and validating agent-compatible tools. Given an API specification and a Tool Specification Requirement (TSR), ToolSmith produces a tool function and verifies it through a closed-loop process: it creates natural language (NL) tests and executes the tool in a secure agent sandbox for validation. For state-changing tools, ToolSmith confirms outcomes by querying the API with parameters derived from the NL tests. If the tool fails to produce the desired output, ToolSmith generates diagnostic feedback to iteratively regenerate it. By ensuring both functional correctness and agent compatibility, ToolSmith enables reliable automation of enterprise workflows. We have also shown an improved performance of our approach compared to the standard LATM (LLM as tool maker) baseline on a generated benchmark dataset.
Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010
Masataro Asai, Stephen Wissow
AAAI 2026
Chen-chia Chang, Wan-hsuan Lin, et al.
ICML 2025
Gang Liu, Michael Sun, et al.
ICLR 2025