Do you want to push the boundaries of AI inference speed and accuracy at global scale?
Are you passionate about optimizing how models perform in production serving environments?
Join the Akamai Inference Cloud Team!
The Akamai Inference Cloud team is part of Akamai's Cloud Technology Group. We design and operate AI platforms that enable customers to run models with unmatched performance, compliance, and economics. The Model Intelligence & Lifecycle team owns the end-to-end model lifecycle from validation and security scanning through quantization, optimization, and monitoring. We ensure every model meets rigorous standards for quality, safety, and performance.
Partner with the best
As an ML Performance Engineer, you will optimize inference performance across the Akamai Inference Cloud. Your focus will be at the intersection of speed and accuracy applying techniques like quantization, speculative decoding, and hardware-aware scheduling to maximize throughput and minimize latency.
You will collaborate closely with hardware performance engineers to deliver end-to-end optimization.
As an ML Performance Engineer Principal Lead, you will be responsible for:
- Applying and evaluating quantization, distillation, and pruning techniques to optimize model performance while preserving accuracy
- Designing hardware-aware model placement and scheduling strategies to match models with optimal compute resources
- Implementing and tune speculative decoding, KV-cache optimization, and batching strategies to improve inference throughput and latency
- Building benchmarking and profiling pipelines to measure model-layer performance across architectures, hardware, and serving configurations
- Mentoring and guiding engineers on the team through code reviews, design discussions, and technical problem-solving
- Collaborating with hardware performance engineers to identify and resolve end-to-end performance bottlenecks across the inference stack
Do what you love
To be successful in this role you will:
- 12+ years of relevant experience with a Bachelor's or Master's degree in Computer Science, Machine Learning, or a related field
- Possess hands-on experience optimizing LLM inference performance (quantization, speculative decoding, model compression, etc.)
- Have a solid understanding of transformer architectures and how design choices impact latency, throughput, and accuracy
- Possess experience with inference serving frameworks such as vLLM, TensorRT-LLM, Triton, or similar systems
- Be proficient in Python and C++ with experience profiling and optimizing compute-intensive workloads
- Have familiarity with hardware-aware optimization, including GPU/accelerator scheduling and memory management trade-offs
Work in a way that works for you
FlexBase, Akamai's Global Flexible Working Program, is based on the principles that are helping us create the best workplace in the world. When our colleagues said that flexible working was important to them, we listened. We also know flexible working is important to many of the incredible people considering joining Akamai. FlexBase, gives 95% of employees the choice to work from their home, their office, or both (in the country advertised). This permanent workplace flexibility program is consistent and fair globally, to help us find incredible talent, virtually anywhere. We are happy to discuss working options for this role and encourage you to speak with your recruiter in more detail when you apply.