Annapurna Labs, an AWS organization with development centers in the U.S. and Israel, builds custom silicon and software for AWS customers. Our team combines cloud-scale innovation with world-class expertise across silicon engineering, hardware design, verification, software, and operations to tackle technical challenges that have never been seen before.
Join our Post-Silicon Validation team to quantify and qualify the performance of AWS's custom ML training chips against architectural targets. You'll bridge the gap between silicon capabilities and real-world ML workload demands — ensuring our accelerators deliver on latency, throughput, and efficiency promises at cloud scale.
You'll work in a fast-paced, startup-like environment alongside some of the brightest minds in the industry on next generation AI/ML hardware that powers AWS's training and inference infrastructure. Your analysis will directly shape architectural decisions for next-generation accelerators and determine when silicon is ready for production deployment.
Key job responsibilities
Design and execute performance benchmarks spanning micro-architectures to full model training
Measure and analyze compute throughput, memory bandwidth, interconnect latency, and more
Profile real ML workloads (transformer models, LLMs, vision models) on silicon
Identify performance bottlenecks and work with architecture teams on optimization
Build automated performance regression dashboards and tracking infrastructure
Correlate silicon measurements against RTL simulation and emulation predictions
A day in the life
Your primary focus is measuring and understanding how our AI chips perform under real workloads. You'll spend mornings digging into benchmark results — figuring out where cycles are being lost and why throughput isn't hitting targets. When something looks off, you'll instrument the hardware, profile the pipeline, and work with design teams to get it fixed. Some days you'll be developing and running full training models end-to-end; others you'll be building the dashboards that tell leadership whether silicon is ready to ship.
About the team
The MLA Post-Silicon Validation team owns validation of AWS's next-generation ML training accelerators from first silicon through production deployment in AWS data centers. We sit at the intersection of hardware, firmware, and ML software — ensuring every layer of the stack performs, scales, and meets the quality bar. Our team culture values deep technical ownership, data-driven decisions, and a bias for action. We operate with startup agility backed by AWS-scale resources, and our work directly enables the cloud computing infrastructure that millions of customers rely on for AI/ML workloads.