New Agentic AI Features Enhance Amazon OpenSearch Service for Incident Management

Amazon OpenSearch Service has unveiled three new agentic AI features designed to improve observability workflows for Site Reliability Engineering (SRE) and DevOps teams. These enhancements aim to facilitate quicker identification of root causes during incidents, significantly reducing the time engineers spend on manual log analysis.

Previously, resolving incidents required deep expertise and extensive manual effort, often leading to delays in service recovery. The new agentic AI features are integrated into the OpenSearch UI, allowing users to address alerts and trace root causes in a matter of minutes.

Key Features

Ask AI Button: This feature opens a chatbot that understands the context of the current page, enabling users to ask questions about their data or initiate investigations.
Investigation Agent: This agent leverages a plan-execute-reflect model, allowing it to handle complex tasks through iterative reasoning and execution.
Hypothesis-Driven Investigations: The investigation agent generates a root cause analysis report based on the most likely hypotheses, allowing users to review and validate findings.

How It Works

To utilize the new features, users can simply click the Ask AI button in the OpenSearch UI. The chatbot can generate queries without requiring expertise in the Piped Processing Language (PPL). For instance, when faced with elevated latency alerts, users can enter a simple question, and the chatbot will generate the appropriate query to retrieve results.

For more complex incidents, users can initiate the investigation agent by selecting the Start Investigation option. They can provide specific goals and context, such as identifying the root cause of latency issues across services. The agent then conducts a thorough analysis, correlating data from multiple indices to surface potential causes.

Real-Time Collaboration

The investigation process is collaborative, allowing users to follow the agent’s reasoning in real time. As the agent progresses, it reflects on the results and adjusts its approach based on new information. This mirrors the workflow of experienced incident responders, but automates the process to complete it in minutes.

Security and Permissions

Access to the new AI features is governed by user permissions, ensuring that only authorized users can access specific data sources. The results of investigations are securely stored and can be encrypted using either service-managed or customer-managed keys.

Conclusion

The introduction of agentic AI capabilities in Amazon OpenSearch Service marks a significant advancement in observability and incident management. By automating complex investigations and providing context-aware assistance, these features empower engineering teams to focus on resolving issues rather than spending excessive time on data queries.