Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 107 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,83 @@ Because of their special behavior of being preserved on context window overflow,

The Prompt API supports **tool use** via the `tools` option, allowing you to define external capabilities that a language model can invoke in a model-agnostic way. Each tool is represented by an object that includes an `execute` member that specifies the JavaScript function to be called. When the language model initiates a tool use request, the user agent calls the corresponding `execute` function and sends the result back to the model.

Here’s an example of how to use the `tools` option:
There are two tool use modes: with automatic execution (closed loop) and without automatic execution (open loop).

Regardless of with or without automatic execution, the session creation and appending signature are the same. Here’s an example:

```js
const session = await LanguageModel.create({
initialPrompts: [
{
role: "system",
content: `You are a helpful assistant. You can use tools to help the user.`,
},
],
tools: [
{
name: "getWeather",
description: "Get the weather in a location.",
inputSchema: {
type: "object",
properties: {
location: {
type: "string",
description: "The city to check for the weather condition.",
},
},
required: ["location"],
},
},
],
});
Comment thread
jingyun19 marked this conversation as resolved.
```

In this example, the `tools` array defines a `getWeather` tool, specifying its name, description and input schema.

Few shot examples of tool use can be appended like so:

```js
await session.append([
{role: "user", content: "What is the weather in Seattle?"},
{role: "tool-call", content: {type: "tool-call", value: {callID:" get_weather_1", name: "get_weather", arguments: {location:"Seattle"}}},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a technical point of view, tool-call is not a message in the conversation. assistant would be the role and the content could then include tool-call tokens.
So in my opinion the assistant message should always return the actual generated tokens, but for convenience (and cross-model compatibility) also the parsed toolcalls (btw. thats also the same for thinking)

{
  role: "assistant",
  content: "...",
  toolCalls: [
    {
      callId: "get_weather_1",
      name: "get_weather",
      arguments: {
        location: "Seattle"
      }
    }
  ]
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear what the client can do with the actual tokens if they have the parsed tool calls anyways. Also, the actual tokens and parsers for function calling are browser- and model-specific implementations, so I'm not sure if all browsers want to expose to client.

(How we expose thinking is a different topic that worth its separate discussion)

{role: "tool-result", content: {type: "tool-response", value: {callID: "get_weather_1", name: "get_weather", result: [{type:"object", value: {temperature: "55F", humidity: "67%"}}]}},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, role: "tool-result"+ conten.type: "tool-response"seems unnecessary.
Whit this structure we have a role tool-result and then, just like all the other messages we have a content array where each element can have a different type, depending on what the tools wants to return.

{
  role: "tool-result",
  content:  [
    {
      callId: "get_weather_1",
      type: "object",
      value: [
        {
          type: "object",
          value: {
            temperature: "55F",
            humidity: "67%"
          }
        }
      ]
    }
  ]
}

{role: "assistant", content: "The temperature in Seattle is 55F and humidity is 67%"},
]);
Comment on lines +180 to +185

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
await session.append([
{role: "user", content: "What is the weather in Seattle?"},
{role: "tool-call", content: {type: "tool-call", value: {callID:" get_weather_1", name: "get_weather", arguments: {location:"Seattle"}}},
{role: "tool-result", content: {type: "tool-response", value: {callID: "get_weather_1", name: "get_weather", result: [{type:"object", value: {temperature: "55F", humidity: "67%"}}]}},
{role: "assistant", content: "The temperature in Seattle is 55F and humidity is 67%"},
]);
await session.append([
{ role: "user", content: "What is the weather in Seattle?" },
{
role: "tool-call",
content: {
type: "tool-call",
value: {
callID: " get_weather_1",
name: "get_weather",
arguments: { location: "Seattle" },
},
},
},
{
role: "tool-result",
content: {
type: "tool-response",
value: {
callID: "get_weather_1",
name: "get_weather",
result: [
{ type: "object", value: { temperature: "55F", humidity: "67%" } },
],
},
},
},
{
role: "assistant",
content: "The temperature in Seattle is 55F and humidity is 67%",
},
]);

```

Note that "role" and "type" now supports "tool-call" and "tool-result".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that "role" and "type" now supports "tool-call" and "tool-result".
Note that `"role"` and `"type"` now support `"tool-call"` and `"tool-result"`.

`content.result` is a list of a dictionary of `type` and `value`, where `type` can be `{"text", "image", "audio", "object" }` and `value` is `any`.

#### Open Loop:

Open loop is enabled by specifying `tool-call` in `expectedOutputs` when the session is created.

When a tool needs to be called, the API will return an object with `callId` (a unique identifier of this tool call), `name` (name of the tool), and `arguments` (inputs to the tool), and client is expected to handle the tool execution and append the tool result back to the session. The `argument` is a dictionary fitting the JSON input schema of the tool's declaration; if the input schema is not "object", the value will be wrapped in a key.

Example:

```js
sessionOptions = structuredClone(options);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sessionOptions = structuredClone(options);
const sessionOptions = structuredClone(options);

sessionOptions.expectedOutputs.push(["tool-call"]);
session = await LanguageModel.create(sessionOptions);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
session = await LanguageModel.create(sessionOptions);
const session = await LanguageModel.create(sessionOptions);


var result = await session.prompt("What is the weather in Seattle?");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var result = await session.prompt("What is the weather in Seattle?");
let result = await session.prompt("What is the weather in Seattle?");

if (result.type=="tool-call") {
if (result.name == "get_weather") {
const tool_result = getWeather(result.arguments.location);
result = session.prompt([{role:"tool-result", content: {type: "tool-result", value: {callId: result.callID, name: result.name, result: [{type:"object", value: tool_result}]}}}])
}
} else{
console.log(result)
}
Comment on lines +200 to +212

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sessionOptions = structuredClone(options);
sessionOptions.expectedOutputs.push(["tool-call"]);
session = await LanguageModel.create(sessionOptions);
var result = await session.prompt("What is the weather in Seattle?");
if (result.type=="tool-call") {
if (result.name == "get_weather") {
const tool_result = getWeather(result.arguments.location);
result = session.prompt([{role:"tool-result", content: {type: "tool-result", value: {callId: result.callID, name: result.name, result: [{type:"object", value: tool_result}]}}}])
}
} else{
console.log(result)
}
sessionOptions = structuredClone(options);
sessionOptions.expectedOutputs.push(["tool-call"]);
session = await LanguageModel.create(sessionOptions);
var result = await session.prompt("What is the weather in Seattle?");
if (result.type == "tool-call") {
if (result.name == "get_weather") {
const tool_result = getWeather(result.arguments.location);
result = session.prompt([
{
role: "tool-result",
content: {
type: "tool-result",
value: {
callId: result.callID,
name: result.name,
result: [{ type: "object", value: tool_result }],
},
},
},
]);
}
} else {
console.log(result);
}

```

Note that we always require tool-response to immediately follow tool-call generated by the model.


#### Closed Loop:

To enable automatic execution, add a `execute` function for each tool's implementation, and add a `toolUseConfig` to indicate that execution is enabled and pose a max number of tool calls invoked in a single session generation:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To enable automatic execution, add a `execute` function for each tool's implementation, and add a `toolUseConfig` to indicate that execution is enabled and pose a max number of tool calls invoked in a single session generation:
To enable automatic execution, add an `execute` function for each tool's implementation, and add a `toolUseConfig` to indicate that execution is enabled and pose a max number of tool calls invoked in a single session generation:


```js
const session = await LanguageModel.create({
Expand Down Expand Up @@ -171,13 +247,41 @@ const session = await LanguageModel.create({
return JSON.stringify(await res.json());
},
}
]
],
toolUseConfig: {enabled: true},
});

const result = await session.prompt("What is the weather in Seattle?");
```

In this example, the `tools` array defines a `getWeather` tool, specifying its name, description, input schema, and `execute` implementation. When the language model determines that a tool call is needed, the user agent invokes the `getWeather` tool's `execute()` function with the provided arguments and returns the result to the model, which can then incorporate it into its response.
When the language model determines that a tool call is needed, the user agent invokes the `getWeather` tool's `execute()` function with the provided arguments and returns the result to the model, which can then incorporate it into its response.

#### Do I need auto execution?

In general, automatic execution is suitable for use cases where the model quality is good enough via prompt tuning. That can either mean you are tolerable for certain mistakes that the model makes when making tool calls, or the task is simple enough for the model to handle (e.g, just a few distinct tools, short and clean tool output, short context window, etc)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I get this right. For the user, both, the closed and the open loop, are executed automatically. The only difference is that in an open loop, the developer has to execute the tools and start the next generation, while in the closed loop the loop will run without any extra steps.
Also if I dont want to have "automatic execution" as a developer, I could always intercept in the execute function. I would even argue for the wohle LLM conversation it is better to intercept a tool execution inside the execute function. Because then it allows you to return a reason why the tool was not executed intead of letting the model generate the tool call and then it does not know why it was not executed.


On the other hand, open loop allows more flexibility for intercepting at various points in the planner loop (the reason->action->observation loop) where you can inject your business logic programmatically.

Here are a few patterns where open loop would be useful:

1) context management

If your session might go through a long chain of contents, and the previous tool results are no longer important or relevant for your use case, open loop gives the flexibility of editing and recreating the session in the middle of a tool call. You can manually compress and modify the history, and recreate a new session with less content.

For example, for a shopping agent, your tool keeps track of a live shopping cart, but only the latest cart status is important. When there have been multiple rounds of cart updates, you might need to compress the tool call history to avoid exceeding context window, improve latency and quality.

2) Conditional loop breaking

If your business logic requires some determinism in some critical states, open loop allows the flexibility to early exit the planner loop and output a pre-determined action.

For example, for a shopping agent, you might be required to get an explicit confirmation before placing the order. Whenever the tool `"place_order"` is called in the first time, you want to exit the planner loop immediately, and display a verbatim message to the user

3) Conditional constraints

In automatic execution, the planner loop decodes various and mutliple times. If you need to supply constraints dynamically, you'd use the open loop API and control the planner loop yourself. Because the planner loop runs the entire loop behind the scene, the closed loop API doesn't have a natural way to supply a different constraint for each LLM step.

For example, you might want the model to always generate tool `FOO` after tool `BAR` is called; or you might want the model to always generate text only with some prefix after tool `FOO` is called.


#### Concurrent tool use

Expand Down
51 changes: 44 additions & 7 deletions index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,11 @@ interface LanguageModel : EventTarget {
static Promise<Availability> availability(optional LanguageModelCreateCoreOptions options = {});
static Promise<LanguageModelParams?> params();

// The return type from prompt() method and those alike.
typedef (DOMString or sequence<LanguageModelMessageContent>) LanguageModelPromptResult;

// These will throw "NotSupportedError" DOMExceptions if role = "system"
Promise<DOMString> prompt(
Promise<LanguageModelPromptResult> prompt(
LanguageModelPrompt input,
optional LanguageModelPromptOptions options = {}
);
Expand Down Expand Up @@ -80,13 +83,11 @@ interface LanguageModelParams {
callback LanguageModelToolFunction = Promise<DOMString> (any... arguments);

// A description of a tool call that a language model can invoke.
dictionary LanguageModelTool {
dictionary LanguageModelToolDeclaration {
required DOMString name;
required DOMString description;
// JSON schema for the input parameters.
required object inputSchema;
// The function to be invoked by user agent on behalf of language model.
required LanguageModelToolFunction execute;
};

dictionary LanguageModelCreateCoreOptions {
Expand All @@ -97,7 +98,7 @@ dictionary LanguageModelCreateCoreOptions {

sequence<LanguageModelExpected> expectedInputs;
sequence<LanguageModelExpected> expectedOutputs;
sequence<LanguageModelTool> tools;
sequence<LanguageModelToolDeclaration> tools;
};

dictionary LanguageModelCreateOptions : LanguageModelCreateCoreOptions {
Expand Down Expand Up @@ -148,16 +149,52 @@ dictionary LanguageModelMessageContent {
required LanguageModelMessageValue value;
};

enum LanguageModelMessageRole { "system", "user", "assistant" };
enum LanguageModelMessageRole { "system", "user", "assistant", "tool-call", "tool-response" };

enum LanguageModelMessageType { "text", "image", "audio" };
enum LanguageModelMessageType { "text", "image", "audio","tool-call", "tool-response" };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the MessageType should describe the type, or in other words "the modality".
MessageRole: Where does the message come from (who is the sender?)
MessageType: What is the content type of the message (text, image, audio)

Is there a specific reason why I should return a message where the LanguageModelMessageRole = tool-call AND the LanguageModelMessageType = tool-call?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered we created "tool-call" for LanguageModelMessageRole so that different browsers can easily format the role for its model-specific implementation. E.g, empty string, or "" or some different control tokens.

If only MessageType supports "tool-call" type and we assumed the role to be "assistant", then the implementation might becomes more complicated than declaring it anyhow and let implementation omit it.

It's also not without precedence that separate roles are defined for tool call and tool response: https://huggingface.co/Trelis/openchat_3.5-function-calling-v3


typedef (
ImageBitmapSource
or AudioBuffer
or BufferSource
or DOMString
or LanguageModelToolCall
or LanguageModelToolResponse
) LanguageModelMessageValue;

// The definitions of `LanguageModelToolCall` and `LanguageModelToolResponse` values
enum LanguageModelToolResultType { "text", "image", "audio", "object" };

dictionary LanguageModelToolResultContent {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, the result of a tool call is nothing else than a new message in the conversation. But now the message is not coming from the assistant or the user, but from a tool. So I think we should align this with the LanguageModelMessageContent.

required LanguageModelToolResultType type;
required any value;
};

// Represents a tool call requested by the language model.
dictionary LanguageModelToolCall {
required DOMString callID;
required DOMString name;
object arguments;
};

// Successful tool execution result.
dictionary LanguageModelToolSuccess {
required DOMString callID;
required DOMString name;
required sequence<LanguageModelToolResultContent> result;
};

// Failed tool execution result.
dictionary LanguageModelToolError {
required DOMString callID;
required DOMString name;
required DOMString errorMessage;
};

// The response from executing a tool call - either success or error.
typedef (LanguageModelToolSuccess or LanguageModelToolError) LanguageModelToolResponse;


</xmp>

<h3 id="prompt-processing">Prompt processing</h3>
Expand Down