AWS re:Invent 2025 - The incident is over: Now what? (COP216)
Optimal operational practice defines how to handle inevitable incidents and recover quickly. What about the aftermath? How do we ensure that the true root cause is tracked down and that effective preventive actions are planned for implementation? How do we turn every incident into an organization-wide learning opportunity? How do the shared responsibility model and third-party software vendors come into play?