If building the platforms that enable hundreds of engineers to ship safely to one of the world's largest storage services excites you, the EBS Server Agility team is the place to do it. Our team builds and operates the continuous integration infrastructure that validates every code change to EBS server software before it reaches production — infrastructure that serves millions of AWS customers across 400,000 bare-metal servers.
We're looking for a Senior Software Development Engineer to drive the technical direction of our CI systems — from distributed test orchestration that runs thousands of test suites daily across heterogeneous hardware, to AI-powered developer tools that generate tests, predict failures, and surface regressions before they reach production. You'll lead the design of solutions to complex problems involving testing at massive scale, performance and memory profiling across multiple CPU architectures (x86 and ARM/Graviton), secure automated operations for CI infrastructure, and intelligent qualification systems that balance speed, cost, and coverage.
CI at EBS sits in the critical path for developer velocity and production quality. We shape how hundreds of EBS engineers develop, test, and qualify software for a storage engine that powers AWS. We regularly partner with principal engineers and teams across EC2 and EBS to deliver integrations that make development faster and safer. If you want your technical leadership to have real impact at AWS scale, and you're excited by building secure, intelligent systems that make engineers more productive, we'd love to talk to you.
Key job responsibilities
- Drive technical design and architecture for CI systems that validate EBS server code changes across multiple hardware architectures with fast, reliable signal
- Architect distributed test orchestration infrastructure that runs thousands of test suites daily, giving developers confidence in what they ship
- Drive shift-left testing strategies and quality frameworks that catch defects earlier in the development cycle — emphasizing quality, agility, frugality, and engineering efficiency across the CI pipeline
- Design intelligent test selection and qualification systems using AI/ML — cutting qualification time without sacrificing coverage
- Build secure, automated operational infrastructure for CI services — ensuring availability, observability, and autonomous recovery
- Design memory safety and performance profiling infrastructure that detects spatial/temporal errors, race conditions, and regressions at scale
- Drive toward AI-native CI — automated test generation, intelligent code review, predictive failure analysis, and debugging assistants
- Optimize CI infrastructure costs while maintaining quality (thousands of server-hours per release cycle)
- Lead design reviews, raise the engineering bar through code reviews, and establish best practices for CI reliability and developer experience
- Mentor engineers across levels and contribute to a team culture of technical excellence and operational rigor
A day in the life
Your work spans the full range of CI engineering — from investigating flaky test patterns and building automated detection systems, to designing intelligent test selection models that cut qualification time in half. You'll work on new compliance enforcement mechanisms that keep the development bar high, analyze CI data to identify systemic quality trends, and build AI-powered tooling that accelerates how engineers write and validate code.
This is a high-visibility role. CI decisions affect the velocity and confidence of every EBS server engineer, and your work is regularly reviewed by senior leadership. You'll partner with principal engineers and teams across EBS and EC2 on cross-service integrations, influence technical direction beyond your immediate team, and represent CI quality in discussions that shape how EBS develops software at scale.
We value senior engineers who lead through ambiguity, reduce toil through smart automation, make principled tradeoffs between speed and safety, and elevate those around them.
About the team
We design and operate distributed testing systems that qualify every code change for production deployment, build AI-powered developer tools that accelerate the development workflow, maintain secure automated operations for CI services, and provide observability infrastructure that surfaces quality trends across the entire codebase.
We're a 10-person team in Boston building toward a future where CI is intelligent, fast, and self-healing — catching bugs before production with minimal developer friction. There's significant greenfield opportunity: testing bare-metal storage software at this scale has almost no off-the-shelf solutions, so you'll be designing from first principles.