Join the team that powers one of AWS's most critical services - Elastic Block Store (EBS)! The EBS Placement team builds systems that optimize how millions of storage volumes are distributed across AWS's vast infrastructure. Our work directly impacts every EBS customer, from startups to the world's largest enterprises, who rely on us to serve exabytes of data and trillions of I/Os daily.
We are seeking talented engineers to help evolve how we place EBS volumes on storage servers. We tackle complex technical challenges at massive scale - from optimizing storage density and improving I/O performance to ensuring data durability/availability across in the face of multitude of failure modes. You'll work on distributed systems that make real-time decisions about where to place customer data while balancing multiple competing constraints including performance, availability, durability, and cost-effectiveness.
What makes our team unique is the opportunity to work at the intersection of infrastructure optimization and customer experience. When you join us, you'll be part of evolving how EBS delivers storage services at unprecedented scale and configurability. We're working on exciting initiatives like building predictable performance guarantees for storage operations, developing intelligent resource modeling and simulation systems, and creating next-generation placement algorithms that will enable us to serve more customers with better performance while optimizing resource utilization. If you're excited about solving complex distributed systems problems that directly impact millions of customers, we'd love to talk to you!
A day in the life
What makes this role exciting is that every day brings new challenges as customer workloads grows and storage technology evolves. You'll be at the forefront of ensuring that millions of chunks of data/workload are placed just right across vast EBS storage fleet.
Much of your time will be hands-on with our systems. You might be:
* Designing and writing code to update placement decision engine for any number of reasons including launch of new storage feature, utilization of new server capability, adding/revising optimization functions and so on.
* Diving into data to make design decisions or measure effectiveness of changes you make.
* Reasoning about a wide range of factors in the decisions involved in above, such as variability of AWS infrastructure and customer workload patterns worldwide, complex interplay between competing optimization functions, diversify placement of data replicas, staleness in data used by decision engine etc.
* Debugging complex distributed systems issues that require careful analysis and creative problem-solving.
* Review proposals and code from peers from the team as well as partner teams.
Beyond these regular activities, you might find yourself providing consultation to partner teams on decisions like planning out product rollout and migrations, capacity planning, fast mitigation of customer impact with workload movement capabilities in placement systems, root-causing performance degradations and so on.
About the team
You’ll join a group of strong engineers who proudly owns some of EBS’s most critical responsibilities! We closely collaborate to design and deploy sophisticated algorithms that make workload placement decisions while accounting for tens of different optimization objectives. We own real-time, highly available systems that make placement decision at fraction of a second as well as background fleet optimizer that continuously rebalances heat and reacts to continuously changing fleet topology and customers’ workload patterns. We move fast to deliver for our customers while maintaining operational excellence across these mission critical services.
In this role, you will regularly collaborate with principle engineers and scientists to make high judgement decisions backed by data. You will partner with and influence 5+ teams across the organization and be able to form deep understanding of how one of earth’s largest storage business operates. We value teammates who are passionate about distributed systems, bring strong analytical skills to solve complex problems, and thrive in an environment where they can see the direct impact of their work on customers and business outcomes.