Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday.
Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.
In Annapurna Labs we are at the forefront of hardware/software co-design not just in Amazon Web Services (AWS) but across the industry. The Trainium Manufacturing, Quality and Reliability (MQR) Team is part of AWS Annapurna Labs responsible for the end to end manufacturing and deployment of these cutting edge AI products and system designs for the world’s largest Cloud Services provider. The MQR team is looking for candidates interested in leading the manufacturing test team responsible for developing and deploying manufacturing test FW/SW content to our global manufacturing lines. The scope of this role includes working closely with the HW design teams to identify, define and develop test content while building scalable deployment, data automation and diagnostic mechanisms to ensure high efficiency operations across our global manufacturing partner sites.
You’ll provide leadership in the application of new technologies to large scale deployments in a continuous effort to deliver a world-class customer experience. This is a fast-paced, intellectually challenging position, and you’ll work with thought-leaders in multiple technology areas. You’ll have high standards for yourself and everyone you work with, and you’ll be constantly looking for ways to improve our products' performance, quality and cost. We’re changing an industry, and we want individuals who are ready for this challenge and want to reach beyond what is possible today.
Key job responsibilities
As a Test Development manager, you are responsible for working with the Lead Manufacturing Test Engineer to identify key test coverage gaps based on manufacturing and fleet performance for the current and next generation Machine Learning Acceleration (MLA) product family. You will lead the team to deliver solutions addressing these needs. Key responsibilities include
* Scale and manage a team of manufacturing test and data automation engineers
* Drive test coverage improvement strategies
* Develop manufacturing validation methodologies and infrastructures
* Collaborate with Manufacturing engineering, Quality and Reliability, HW design, Fleet Operations and Infrastructure teams teams to ensure delivery of high quality systems to ODM/CM manufacturing sites and AWS data centers
* Own test deployment schedules and periodic reporting of key performance indicators
* Closely monitor global high volume manufacturing sites/vendors
* Collaborate with global teams to provide 24x7 operations
* Provide technical mentoring for the team.
About the team
AWS is the world’s leading and most trusted provider of virtualized public cloud utility services. We offer our global IT customer base who span private, corporate and government sectors, over 100 fully featured, integrated services in Gen AI, compute, storage, database, analytics, mobile, Internet of Things (IOT) and enterprise applications. AWS operates a worldwide fleet of interconnected enterprise data centers at hyperscale to deliver the capacity that powers our customers IT infrastructure which enables their ability to concentrate on core competencies through agility and operational efficiency. To learn more about AWS, visit https://aws.amazon.com
AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.
Annapurna Labs is a wholly owned subsidiary of AWS, focused on developing custom silicon and servers including the Nitro, Graviton, Inferentia, and Trainium families of processors.
Machine Learning Annapurna (MLA) functions as a vertically integrated team including software, firmware, hardware, and silicon design in a single organization. We are the Trainium Servers and Systems organization under MLA focused on Hardware Development, Software Development, Fleet Ops Systems, and Manufacturing, Quality, and Reliability. This position is in the Manufacturing, Quality and Reliability team.
Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and leadership development. We care about your career growth and strive to assign projects that help our team members develop your leadership and technical expertise so you feel empowered to take on more complex tasks in the future.