Annapurna Neuron is the team that delivers the software that powers the Inferentia and Trainium-based Inf1, Inf2, Trn1 an Trn1n families of Machine Learning Accelerated EC2 instances. We build a Compiler , Drivers, Runtime and a fully integrated suite of Pytorch and Tensorflow stacks, providing the highest-scale, distributed Training and Inference servers available. We deliver the solutions powering many of Amazon and Amazon customers largest Machine Learning services.
We're seeking a Principal Program Manager for the Annapurna ML, Neuron team. In this role you will be owning the capacity request lifecycle from intake through approval and delivery, coordinating across EC2, finance, and internal engineering stakeholders to secure and allocate Trainium instances. Establish and maintain dashboards, utilization metrics, and recurring review mechanisms to proactively identify gaps and drive resolution. You will be responsible for scoping and delivering large projects end-to-end. Responsibilities include collection of business and systems requirements from internal and external customers, writing specifications, driving project schedules from design to release, and managing the production launch. You will lead and coordinate design/implementation efforts between Neuron, internal teams and external customers and partners to develop optimal solutions for new features, verticals, and markets. You will be expected to make appropriate tradeoffs to optimize time-to-market, clearly communicate goals, roles, responsibilities, and desired outcomes to internal cross-functional and remote project teams.
The right candidate will possess a strong technical and program management background, will have demonstrated experience leading medium to large projects, and will have a well-rounded technical background in current machine learning technologies as well as products spanning multiple technical domains from higher performance networking/compute and drivers to compilers and software development programs . You must be able to thrive and succeed in an entrepreneurial environment, and not be hindered by ambiguity or competing priorities. This means you are not only able to develop and drive high-level strategic initiatives, but can also roll up your sleeves, dig in and get the job done.
As a Principal TPM, you will anticipate bottlenecks, provide escalation management, anticipate and make trade-offs, and balance the business needs versus technical constraints. An ability to take large, complex projects and break them down into manageable pieces, develop functional specifications, then deliver them in a timely manner.
In this role you will:
· Drive execution of projects
· Provide technical direction with limited assistance
· Lead cross functional project meetings
· Lead milestone reviews
· Present project status to the executive team
· Manage project builds globally
Key job responsibilities
Work with Engineering leadership, Product Management, and Business Development to help define requirements and roadmaps, work with Engineering leadership to drive efficiencies in engineering development and ensure the right things are delivered to Customers at the right time. We work like a startup - moving fast and building new things.
About the team
Annapurna ML Neuron is the team that delivers the software that powers the Inferentia and Trainium-based Inf1, Inf2, Trn1 and Trn2 families of Machine Learning Accelerated EC2 instances. We build a Compiler , Drivers, Runtime and a fully integrated suite of Pytorch and Tensorflow stacks, providing the highest-scale, distributed Training and Inference servers available. We deliver the solutions powering many of Amazon and Amazon customers largest Machine Learning services.