Do you want to build the networking infrastructure that powers every virtual machine in the world's largest cloud? Are you excited by the challenge of writing systems software that runs on millions custom networking cards across millions of hosts?
The Amazon Elastic Compute Cloud (EC2) Instance Networking team builds and operates the software-defined networking stack at the heart of AWS. Our code is what makes Virtual Private Clouds (VPC) work, what enforces customer security policies, and what enables workloads to scale from a single instance to clusters of tens of thousands. Every packet that enters or leaves an EC2 instance flows through software we write, running on custom silicon designed in-house.
This work sits at the intersection of cloud infrastructure and the AI revolution. As machine learning workloads grow to unprecedented scale - training runs spanning thousands of accelerators, clusters demanding millions of network endpoints, and fabrics pushing hundreds of gigabits per second. We build the networking primitives that make these workloads possible: high-performance interfaces, low-latency data plane protocols, and scalable network virtualization that operates predictably even under heavy churn. When the world's most ambitious AI companies scale on AWS, they're relying on the software we build.
You'll write low-level systems code in C and Rust running on custom hardware in a real-time
embedded environment. You'll work on network virtualization protocols, packet processing pipelines, and availability mechanisms that must perform correctly at massive scale with sub-second latency requirements. You'll solve problems that span hardware, firmware, and
software, and see the direct impact of your work on customers running some of the largest and most demanding workloads in the cloud.
Key job responsibilities
- Design and implement data plane networking software on custom hardware, working across the full software lifecycle from design through deployment and operations
- Build and improve network virtualization systems that deliver VPC connectivity, security policy enforcement, and performance isolation for EC2 customers
- Investigate and resolve complex production issues across distributed embedded systems, driving root cause analysis and permanent fixes
- Collaborate with partner teams across hardware, control plane, and ML platforms to deliver end-to-end solutions