-
Notifications
You must be signed in to change notification settings - Fork 605
interrupts - graph - agent based #1533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Mohammad Salehan <salehanw@amazon.com.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
|
||
| self._interrupt_state.interrupts.update({interrupt.id: interrupt for interrupt in interrupts}) | ||
| self._interrupt_state.activate() | ||
| if isinstance(node.executor, Agent): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agents are not allowed to use session managers in a graph execution. Consequently, we need to store some agent state in the graph interrupt state to help persist the interrupt workflow between shutdowns. Note, this is the same behavior we have in Swarm (src).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We check if executor is an Agent instance right now because MultiAgentBase executors may not have the same context. We will figure out how to handle that case in a separate (and final) PR for graph interrupt support.
| - Agent: Data validation complete. All records verified, no anomalies detected. | ||
| ``` | ||
| """ | ||
| if self._interrupt_state.activated: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is where we restore the agent node state upon resuming. We extract it from the graph interrupt state.
| from strands.types.tools import ToolContext | ||
|
|
||
|
|
||
| @pytest.fixture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I am making the switch to interrupt on an agent node rather than through a hook. This is what we do for the swarm integ test. It is more comprehensive as it not only tests persistence of the graph interrupt state, but also the interrupt state of the agent node. Hook interrupts only require graph interrupt state persistence.
|
|
||
| self._interrupt_state.interrupts.update({interrupt.id: interrupt for interrupt in interrupts}) | ||
| self._interrupt_state.activate() | ||
| if isinstance(node.executor, Agent): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be AgentBase now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet as class Agent does not yet derive from AgentBase (src).
| self._interrupt_state.interrupts.update({interrupt.id: interrupt for interrupt in interrupts}) | ||
| self._interrupt_state.activate() | ||
| if isinstance(node.executor, Agent): | ||
| self._interrupt_state.context[node.node_id] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain how interrupts work in graph at a higher level?
I understand how this works for an interrupt that is raised in one Agent node by that Agent node.
But what happens if two nodes are executing and one of them raises an interrupt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If two nodes are executing in parallel and one interrupts, the other node will be allowed finished. Once done, the call stack returns back to _execute_graph. Here, before preparing the next batch of nodes to execute, we check to see if an interrupt has been activated (any time any node interrupts, we immediately set graph state to INTERRUPTED). If we have an interrupt, we store the one completed node in the interrupt state context. Context about the interrupted node has already been stored in state as part of the _execute_node call.
Upon resuming, we unpack the interrupted node and the one already completed from interrupt state (src). Within _execute_nodes_parallel, we filter the batch down to just the interrupted node (src). This means the completed node does not get executed again. However, we have the reference so that we can identify its dependent nodes to execute next after we finish resuming the interrupted node.
For the interrupted node, we pass the user interrupt responses into node.executor.stream_async (src). The actual task that the agent node is meant to execute is already stored in the agent node message history.
From here, things proceed as normal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For additional context, we explicitly test parallel node interrupts in the integ tests presented further down.
Description
Allow users to raise interrupts from an agent node in Graph. This is a follow up to #1478 and an iteration on #1350.
Usage
Follow Up
Support raising an interrupt from a multi-agent node. There are some special considerations with session management to get this working properly.
Related Issues
#204
Documentation PR
Will update https://strandsagents.com/latest/documentation/docs/user-guide/concepts/interrupts/#multi-agents in follow up. Note, swarm interrupt docs have been added but not yet released.
Type of Change
New feature
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
hatch run prepare: Wrote new unit testshatch test tests_integ/interrupts/multiagent: Wrote new integ testsChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.