AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we’re the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain — and we’re looking for talented people who want to help.
You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.
The Incident Prevention team is looking for experienced software engineers who are excited about building large scale systems spanning tens of thousands of servers, across multiple data-centers worldwide. These are core systems development positions where you will own the design and development of significant software components critical to our industry leading database services architect-ed for the cloud.
In this hands on position you will be asked to do everything from building rock-solid components to mentoring other engineers. You need to not only be a top software developer with a good track record of delivering, but also excel in communication, leadership and customer focus. This is a unique and rare opportunity to get in on the ground floor within a fast growing business and help shape the technology, product and the business. A successful candidate will bring deep technical and software expertise and ability to work within a fast moving, startup environment in a large company to deliver high quality code that has a broad business impact.
The Resilience Infrastructure and Solutions team reduces the number and duration of customer impacting availability events. As part of its mission, the team undertakes disruptive testing to AWS infrastructure and services in non-production environments to identify potential availability gaps and verify that past findings have been addressed. Join a team of systems experts operating at high velocity, with limitless curiosity, and focused on helping AWS prevent customer impacting availability events.
Key job responsibilities
-Build and operate a program for executing large-scale, disruptive tests on non-production Region-scale AWS infrastructure
-Partner with AWS service and infrastructure technical leaders on test planning, execution, and follow ups
-Dive deep into technical findings, incident response, and follow ups with AWS service teams
-Review test strategy and results with senior AWS leaders as part of quarterly reviews
-Guide automation and test strategies that maximize utilization of tests infrastructure while reducing builder toil
About the team
The Incident Prevention team is looking for experienced software engineers who are excited about building large scale systems spanning tens of thousands of servers, across multiple data-centers worldwide. These are core systems development positions where you will own the design and development of significant software components critical to our industry leading database services architect-ed for the cloud.
In this hands on position you will be asked to do everything from building rock-solid components to mentoring other engineers. You need to not only be a top software developer with a good track record of delivering, but also excel in communication, leadership and customer focus. This is a unique and rare opportunity to get in on the ground floor within a fast growing business and help shape the technology, product and the business. A successful candidate will bring deep technical and software expertise and ability to work within a fast moving, startup environment in a large company to deliver high quality code that has a broad business impact.