Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads.
As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people.
This is a senior-level incident management role responsible for leading the incident response function for Prime Video's video-on-demand platform. The key responsibilities include:
- Defining the strategy and operating model for the incident response team to minimize the duration and severity of customer-impacting incidents.
- Leveraging technical expertise to develop the vision for incident management tooling and capabilities to improve observability and triage.
- Owning operational metrics and goals for incident response quality, and fostering a culture of continuous improvement.
- Directly managing high-severity incidents, coordinating cross-functional teams, and driving resolution for complex/ambiguous issues.
- Serving as the point of escalation for critical customer issues, and building relationships with incident response teams across Amazon.
- Educating leaders and engineers on incident response best practices and capabilities.
The ideal candidate has 10+ years of incident management experience, including incident response for a large-scale enterprise. They have strong technical, analytical, and communication skills to liaise effectively with engineering and executive teams.
Key job responsibilities
Define the strategy for the evolution of Prime Video’s response to video on demand impacting incidents. Establish the operating model for how the new, dedicated function will operate within Prime Video.
As a technical lead, leverage your domain expertise to, develop the vision for the incident response tooling and capabilities needed to minimise the duration of customer impacting incidents. Increase the scope of team-maintained dashboards. Influence the roadmaps of the engineering teams developing incident management, observability and triage tooling.
Lead the Incident Response function. Ensure the globally distributed team is ready to respond 24x7. Actively mentor and develop the junior Incident Managers.
Own operational metrics. Set clear, measurable, goals for the quality of incident response and establish mechanisms to drive continual improvement. Foster a culture of continuous improvement through mentoring, feedback and metrics
Lead incident response for high severity incidents. Drive towards resolution by co-ordinating efforts across multiple engineering and operational teams, including for ambiguous problems we might not have seen before. Decompose complex incidents into work streams that can be managed by multiple incident responders in parallel. Manage communications and be the single point of contact for executive leaders
Drive critical, complex customer escalations in situations that are sometimes technically challenging in collaboration with Engineering Teams
Build relationships with the other Incident Response teams across Amazon to share best practice and enable effective collaboration during cross organizational outages/incidents
Educate leaders and engineers across Prime Video on advances in incident response capabilities, the role they play in enabling improved incident response and how they can leverage READI tooling to reduce time to mitigate lower severity incidents.
Communicate ideas effectively, both verbally and in writing, to all types of audiences.
Perform other duties as required by the organization
About the team
The Prime Video platform is complex and constantly changing. consists of thousands of cloud-based services, is built and maintained by thousands of engineers, and serves hundreds of millions customers. We are establishing a team of dedicated incident managers who will be front-and-centre in driving down the duration of incidents impacting customer’s ability to watch video on demand by utilizing their operational experience, knowledge of best practices, and effective usage of incident management tools. We’re looking for an expert in incident response, who has owned operational and/or incident management for at least one large-scale enterprise, to shape the incident response function, define the operational framework and drive delivery of incident response tooling. The team will provide incident response 24x7x365 from two locations.