-
Notifications
You must be signed in to change notification settings - Fork 605
feat(bedrock): add automatic prompt caching support #1438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add CacheConfig with strategy="auto" for BedrockModel to automatically inject cache points at the end of assistant messages in multi-turn conversations. - Add CacheConfig dataclass in model.py with strategy field - Add supports_caching property to check Claude model compatibility - Implement _inject_cache_point() for automatic cache point management - Export CacheConfig from models/__init__.py Closes strands-agents#1432
src/strands/models/model.py
Outdated
| standardized way to configure and process requests for different AI model providers. | ||
| Attributes: | ||
| cache_config: Optional configuration for prompt caching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we should add this to the base model class just yet. Not every model provider will have caching support, so this may end up being a per-model provider feature
src/strands/models/bedrock.py
Outdated
| self.update_config(**model_config) | ||
|
|
||
| # Set cache_config on base Model class for Agent to detect | ||
| self.cache_config = self.config.get("cache_config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We dont need to set this on the base class parameter if its already in self.config
src/strands/models/bedrock.py
Outdated
| This enables prompt caching for multi-turn conversations by placing a single | ||
| cache point that covers system prompt, tools, and conversation history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? Tools and system prompts both have their own cache points defined in the converse stream model. I thought this was just to add automatic caching for the messages array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that tools and system prompts have their own cache point options. The key insight is that Anthropic sends prompts in this order: tools → system → messages. (Link) When a cachePoint is placed at the end of the last assistant message, the cached prefix automatically includes everything before it (system prompt + tools + prior conversation). So a single cachePoint in messages effectively caches the entire context without needing separate cache points for system/tools.
That said, you can still place explicit cache points on system/tools as a fallback for cases like sliding window truncation where message history changes might cause cache misses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, your explanation, and the the FAQ on this page helped me understand this: https://platform.claude.com/docs/en/build-with-claude/prompt-caching#faq
That being said, this comment is a bit misleading, and I would instead give a bit more of a general comment here.
Additionally, we will need to update our documentation to reflect the new caching behavior here. Would you be interested in making that update too? https://github.com/strands-agents/docs/blob/main/docs/user-guide/concepts/model-providers/amazon-bedrock.md?plain=1#L418
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the docs update, I'll create a separate PR after this one is merged.
src/strands/models/bedrock.py
Outdated
| This enables prompt caching for multi-turn conversations by placing a single | ||
| cache point that covers system prompt, tools, and conversation history. | ||
| The cache point is automatically moved to the latest assistant message on each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioned in a previous comment, but how does this compare to the simplified caching mentioned in the docs here: https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html#prompt-caching-simplified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation actually relies on that simplified caching. With simplified caching, we only need to move the cachePoint to the end of the last assistant message on each turn. Anthropic automatically matches the overlapping prefix between the previous cachePoint position and the new one. That's also why we're limiting this strategy to Claude models until other providers support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for the explanation! I would love for this cache_config to be applicable to both anthropic and nova models, and we can change the strategy based on the modelid. Thats out of scope for this pr, but just something to keep in mind!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! Nova models could also be supported, though the cache checkpoint implementation would be more complex than Claude's.
src/strands/models/bedrock.py
Outdated
| cache_tools: Cache point type for tools | ||
| cache_prompt: Cache point type for the system prompt (deprecated, use cache_config) | ||
| cache_config: Configuration for prompt caching. Use CacheConfig(strategy="auto") for automatic caching. | ||
| cache_tools: Cache point type for tools (deprecated, use cache_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache_prompt is deprecated, but I dont think we should to deprecate cache_tools unless cache_config replaces its functionality. For now I would keep cache_tools undeprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I'll keep cache_tools undeprecated. Out of curiosity, why was only cache_prompt (system prompt) deprecated but not cache_tools?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this pr goes into more detail (ref), but we originally represented system prompt as just a string, so you couldnt inject cachepoints easily. cache_prompt was our workaround for that until the pr I referenced actually fixed it.
src/strands/models/bedrock.py
Outdated
| if not isinstance(content, list): | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Not sure we really need this check. content should always be a list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, will remove
src/strands/models/bedrock.py
Outdated
| continue | ||
|
|
||
| for block_idx, block in enumerate(content): | ||
| if isinstance(block, dict) and "cachePoint" in block: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We dont need to re-check objects that are already typed. content blocks are always dicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, will remove
src/strands/models/bedrock.py
Outdated
| if isinstance(block, dict) and "cachePoint" in block: | ||
| cache_point_positions.append((msg_idx, block_idx)) | ||
|
|
||
| # Step 2: If no assistant message yet, nothing to cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are cache points in the messages array already, they wont get removed since we exit early.
I might refactor this function a bit to:
- Loop through the entire messages array backwards
- Once we find encounter the first assistant message, we add a cache point there
- All other cache points we remove along the way
a. Lets add a warning when we remove customer added cache points
This seems to accomplish the same thing as the proposed approach, but is a bit easier to follow. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I'll refactor to loop backwards as you suggested.
tests/strands/models/test_bedrock.py
Outdated
| assert formatted_messages[2]["content"][1]["guardContent"]["image"]["format"] == "png" | ||
|
|
||
|
|
||
| def test_cache_config_auto_sets_model_attribute(bedrock_client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add some more comprehensive unit tests here. We should make sure that cache points are inserted and deleted as intended. I would also like to see an end-to-end test (in the /tests/strands/agent/test_agent.py file) that inserts the cache points, but does not edit the actual agent.messages parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add more unit tests for cache point insertion/deletion logic, and an e2e test.
00a7eca to
92e2a59
Compare
|
This is an important change, thanks for this. |
src/strands/models/bedrock.py
Outdated
| This enables prompt caching for multi-turn conversations by placing a single | ||
| cache point that covers system prompt, tools, and conversation history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, your explanation, and the the FAQ on this page helped me understand this: https://platform.claude.com/docs/en/build-with-claude/prompt-caching#faq
That being said, this comment is a bit misleading, and I would instead give a bit more of a general comment here.
Additionally, we will need to update our documentation to reflect the new caching behavior here. Would you be interested in making that update too? https://github.com/strands-agents/docs/blob/main/docs/user-guide/concepts/model-providers/amazon-bedrock.md?plain=1#L418
src/strands/models/bedrock.py
Outdated
| This enables prompt caching for multi-turn conversations by placing a single | ||
| cache point that covers system prompt, tools, and conversation history. | ||
| The cache point is automatically moved to the latest assistant message on each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for the explanation! I would love for this cache_config to be applicable to both anthropic and nova models, and we can change the strategy based on the modelid. Thats out of scope for this pr, but just something to keep in mind!
src/strands/models/bedrock.py
Outdated
| cache_tools: Cache point type for tools | ||
| cache_prompt: Cache point type for the system prompt (deprecated, use cache_config) | ||
| cache_config: Configuration for prompt caching. Use CacheConfig(strategy="auto") for automatic caching. | ||
| cache_tools: Cache point type for tools (deprecated, use cache_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this pr goes into more detail (ref), but we originally represented system prompt as just a string, so you couldnt inject cachepoints easily. cache_prompt was our workaround for that until the pr I referenced actually fixed it.
🎯 Review - Automatic Prompt CachingExcellent implementation of automatic prompt caching for Bedrock! This addresses #1432 nicely and will provide significant performance and cost benefits for multi-turn conversations. What I Really Like ✅
Minor Suggestions 💡1. Cache Point Detection Could Be More ExplicitIn 2. Model Support DetectionThe 3. Integration Test ClarityThe integration tests are great! One suggestion - add a comment explaining the cache hit verification: # After second call, verify cache hit (cache_read_input_tokens > 0)
# This confirms the cache point strategy is working
assert result.metadata["converse_metrics"]["cache_read_input_tokens"] > 0Questions for Discussion 🤔
CI StatusI see CI is still pending. Once it passes, this looks ready for maintainer review! Overall AssessmentThis is a high-quality PR that will be valuable for the community. The automatic cache management removes complexity from users while providing real performance benefits. Great work, @kevmyung! 🎉 🦆 🤖 This is an experimental AI agent response from the Strands team, powered by Strands Agents. We're exploring how AI agents can help with community support and development. Your feedback helps us improve! If you'd prefer human assistance, please let us know. |
- Add warning when cache_config enabled but model doesn't support caching - Make supports_caching private (_supports_caching) - Fix log formatting to follow style guide - Clean up tests and imports
src/strands/models/bedrock.py
Outdated
| for msg_idx in range(len(messages) - 1, -1, -1): | ||
| msg = messages[msg_idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This might be a bit more clear to whats going on:
| for msg_idx in range(len(messages) - 1, -1, -1): | |
| msg = messages[msg_idx] | |
| for msg_idx, msg in reversed(list(enumerate(messages))): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use reversed(list(enumerate(...)))
| self.config.get("model_id"), | ||
| ) | ||
|
|
||
| cleaned_messages: list[dict[str, Any]] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just ran a test on this code, and it looks like cachePoint is being injected into the agent messages array, which is not what we want.
I have a python script:
from strands import Agent
from strands.models import BedrockModel
from strands.models.model import CacheConfig
agent = Agent(model=BedrockModel(cache_config=CacheConfig(strategy="auto")))
agent("Hello!")
agent("Hello2")
print("\n")
print(agent.messages)
And when I run it I get this output:
(strands-agents) ➜ sdk-python git:(feat/prompt-caching) ✗ python test.py
Hello! It's nice to meet you. How are you doing today? Is there anything I can help you with?Hello again! I see you're saying hello for a second time - Hello2! 😊 How can I assist you today?
[{'role': 'user', 'content': [{'text': 'Hello!'}]}, {'role': 'assistant', 'content': [{'text': "Hello! It's nice to meet you. How are you doing today? Is there anything I can help you with?"},
{'cachePoint': {'type': 'default'}}]}, {'role': 'user', 'content': [{'text': 'Hello2'}]}, {'role': 'assistant', 'content': [{'text': "Hello again! I see you're saying hello for a second time - Hello2! 😊 How can I assist you today?"}]}]
You can see above that when I print out agent.messages, the cache point does show up in the agents original messages.
It might make sense to move your _inject_cache_point function logic to below this cleaned_messages, as this acts as a "clean" copy of the messages that we can edit in the bedrock model provider so we dont inject information into the agent messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Now injecting into cleaned_messages
tests/strands/agent/test_agent.py
Outdated
| assert len(agent.messages) == original_length + 2 | ||
|
|
||
|
|
||
| def test_cache_config_does_not_mutate_original_messages(mock_model, agenerator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given my above comment, it looks like this test is not working as intended. I would update it so that it first fails with the current code, then update your code so you dont inject cachePoint into the messages array so that this test starts passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more test in this file where the messages array that is passed into the stream method does not have a cachePoint content block added might be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test_format_bedrock_messages_does_not_mutate_original.
- Inject into cleaned_messages instead of original messages to avoid mutating agent.messages - Use reversed iteration for safe in-place deletion - Consolidate redundant cache point tests
Summary
CacheConfigwithstrategy="auto"for automatic prompt caching inBedrockModelUsage
Test plan
Closes #1432