AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we’re the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain — and we’re looking for talented people who want to help.
You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. You’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.
Within AIS, the Science team takes on the exciting challenge of using big data and machine learning to optimize power and cooling, the most critical resources in our data centers. In short, we ensure maximum efficiency while preventing overheating and power outages. Our work helps shape future data center designs and drives exceptional cost savings to AWS customers.
As a Software Engineer on the AIS Science team, you will collaborate with scientists, program managers, and data engineers to build, operationalize, and scale machine learning workflows and platform services. Your work will directly impact how server demand is placed by modeling power and cooling load across AWS's global data centers.
You will play a critical role in building infrastructure meant to support all phases of ML models, from R&D to production, including model retraining and iteration. Our team tackles complex challenges in data processing, model hosting, and metric monitoring. As our responsibilities grow and the number of models we manage increases, we’re seeking an innovative senior engineer with a passion for data, machine learning, and MLOps to join our mission-driven team!
If you're passionate about machine learning and model operations, enjoy working in a collaborative and dynamic team that values work-life balance, and want to make a lasting impact on AWS infrastructure worldwide, this is your opportunity. Come join us on this exciting journey!
Key job responsibilities
In this role you will leverage your engineering background and expertise in ML to lead developing platforms for deploying, productionalizing, and scaling machine learning models, with a focus on variant retraining and ongoing model monitoring.
A day in the life
- Lead the design and implementation of a stable and efficient training and inference infrastructure that scales to support a variety of different machine learning models.
- Collaborate with tenured applied scientists and data engineers to develop improved training and inference infrastructure that accelerates innovation and promotes best practice model scoring and model monitoring.
- Quickly learn the ins and outs of AWS infrastructure’s rack planning and forecasting distributed workflows, and engineer solutions to make these systems more robust, fault-tolerant, and efficient across input and output orgs.
About the team
The software team you’ll be joining is called Lanner under AIS Science Engineering. We’re a tight-knit group of eight developers, including one Senior SDE, three junior, and three entry-level engineers. We take pride in solving challenging problems and building impactful solutions—but we also value work-life balance. Our culture encourages healthy boundaries, and we make time to connect as a team through weekly happy hours, regular lunches, and occasional offsites and team events.
About AWS
Diverse Experiences
AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.
Why AWS?
Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.
Inclusive Team Culture
Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.
Mentorship & Career Growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
Work/Life Balance
We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.