Do you enjoy collaborating with teams to solve complex challenges?
Do you have a passion for automation and building systems that scale?
Join our highly skilled Site Reliability Engineering team!
Our team designs, develops, and manages applications and infrastructure that support Akamai Cloud's products and services. Our SRE teams solve reliability, security, and usability at scale for our global fleet while maintaining Akamai's mission at the forefront of what we do: make life better for billions of people, billions of times a day.
Partner with the best
In this role, you will focus on configuration management, IAC, and CI/CD. You will design, develop, and operate infrastructure deployment for the Akamai Cloud.
As a Site Reliability Engineer, you will be responsible for:
- Designing, developing, testing, and operating critical services that support the reliability, scalability, and performance of our infrastructure.
- Designing and implementing observability solutions, including monitoring, logging, alerting, and telemetry capabilities, to proactively detect and resolve issues
- Driving reliability improvements through automation, reducing operational toil and increasing the resilience of engineering processes.
- Developing technical expertise in IAC systems and serving as a trusted technical resource, mentoring engineers and sharing best practices
- Collaborating with software engineering, infrastructure, and platform teams to investigate complex production issues, identify root causes, and implement long-term corrective actions.
- Participating in an on-call rotation and providing leadership during incident response, driving timely service restoration, effective communication, and post-incident improvement efforts.
Do what you love
To be successful in this role you will:
- Have relevant experience and a Bachelor's degree in Computer Engineering, Computer Science or equivalent
- Demonstrate experience in a Site Reliability or Software Engineering role, working with large-scale distributed systems.
- Have experience with Terraform, including module development, state management, workspace design, policy enforcement, and enterprise-scale Infrastructure as Code implementations
- Have experience managing Infrastructure as Code solutions using tools such as Terraform, SaltStack, Ansible, Chef, Puppet, or similar technologies
- Have experience with designing, developing, and deploying software and infrastructure at scale in a Linux environment.
- Have great communication and interpersonal skills
About us
At Akamai, we make life better for billions of people, trillions of times a day.
Whether you're streaming live events, scrolling social media, watching your favorite series, or managing your savings, we're the engine behind the scenes. We provide the world's most distributed platform from Cloud to Edge to help the giants of the digital world work faster and stay more secure, making the internet a better experience for everyone.
Our focus is simple:
Cloud and Edge: Running apps closer to users for instant performance.
Security: Neutralizing threats before they ever reach your data.
Content Delivery: Scaling the world's biggest moments without a glitch.
AI: Enabling our customers to build, secure, and scale AI apps on the world's most distributed cloud platform.