AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we’re the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain — and we’re looking for talented people who want to help.
You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.
Our team designs, builds and operates Amazon's fleet of Accelerated Servers using Internal Amazon design silicon or specialized purpose accelerators (EC2.TRN, INF, G, F + more instance types). We solve systemic hardware issues and we build hardware and software systems to detect and mitigate future recurrences so that our our customers can experience the highest quality of service possible!
You will architect, design, and own a new segment of accelerated servers for the AWS fleet. This includes defining board-level architecture, component selection, and managing manufacturing partnerships through development and production. You will make critical design decisions on thermal, power, signal integrity, and mechanical integration while leading cross-functional teams from concept through data center deployment. You will define requirements and conduct technical reviews to ensure designs meet AWS standards.
Your designs will power AWS infrastructure supporting internal Amazon design silicon and specialized purpose accelerators across EC2 instance types. You will design for reliability and manufacturability, incorporating lessons learned from fleet operations into your architecture decisions. Your designs will include built-in diagnostics and telemetry to enable efficient validation and operations.
Ideal candidates will have a background in server development, system design, root cause, scoping complex issues, qualification, problem solving and developing corrective actions.
Key job responsibilities
Design and Architecture: Own server architecture, board design, component selection, thermal and power design, and ODM technical reviews. Make trade-off decisions balancing performance, cost, and manufacturability. Lead design reviews with manufacturing partners and ensure designs scale to production volumes.
Fleet Operations: Monitor production quality, analyze field data to inform future designs, and drive continuous improvement. Collaborate with operations teams to ensure your designs meet reliability targets in production environments.
A day in the life
You will spend your time making design decisions—defining technical requirements, conducting design reviews with manufacturing partners, selecting components, and architecting thermal and power solutions. You will interface with customers to translate requirements into technical specifications and work with manufacturing partners to ensure your designs scale to production. You will collaborate with interdisciplinary teams including component engineers, firmware developers, test engineers, and integration specialists to deliver complete server solutions.
About the team
*Why AWS*
Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.
*Diverse Experiences*
Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.
*Work/Life Balance*
We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.
*Inclusive Team Culture*
Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (diversity) conferences, inspire us to never stop embracing our uniqueness.
*Mentorship and Career Growth*
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
The Hardware Engineering AI / ML development team is a group of engineers and technical program managers directly responsible for launching hardware in the fleet. Located out of Seattle, Cupertino and Austin we work on programs with global development teams (both internal and external to Amazon). Our servers are located in datacenters globally.
The members of our team have a diverse set of technical backgrounds but all share a common trait of Bias for Action and strong Ownership. We enjoy applying a startup model of delivering fully functional solutions for our customers.