Would you enjoy improving stability and safety of one of the largest global networks?
Would you enjoy hands-on network operations work on a global scale to improve our operational efficiency?
Join our Platform Security Engineering Team
The Platform Security Engineering team is a group of engineers that support and secure Akamai's global network and Linode cloud systems. Our systems provide data security, server integrity, network access, and secure communications infrastructure. This is an opportunity to build software that enables one of the largest platforms in the world!
Partner with the best
As a Site Reliability Engineer, you will collaborate across software development, operations and network infrastructure teams in order to improve and measure the reliability of our systems. You will leverage your programming, distributed systems, troubleshooting and analytical skills to improve reliability and capacity of the system.
As a Senior Lead Site Reliability Engineer you will be responsible for:
- Leading and mentoring a team of experienced SRE Engineers
- Designing and implementing cloud strategies and policies that meet the organization's needs
- Partnering across teams to ensure the reliability, scalability and usability of our products and services
- Designing architecture and overview implementation of complex cloud-based architectures and solutions
- Defining requirements as part of the product lifecycle to influence new designs and standards
- Developing new tools and automation pipelines to support development, testing, and deployment workflows
- Collaborating with our teams during application issues or service incidents to investigate and troubleshoot complex network-related problems
- Maintaining and exceed high SLA targets, ensuring robust system uptime and performance
Do what you love
To be successful in this role you will:
- Have 5 years of relevant experience and a Bachelor's degree in Computer Science or its equivalent
- Have professional experience in a Site Reliability, Development, or Systems Engineering role, with large scale distributed systems
- Have in-depth understanding of computer networking concepts, Security concepts, Unix/Linux internals, distributed systems, and systems design.
- Demonstrate experience with programming or scripting languages such as Python or Bash.
- Have experience using automation tools such as Terraform, Ansible, Jenkins, or Salt Stack
- Possess an understanding of following monitoring and logging tools: Prometheus, Grafana, Loki or similar
Work in a way that works for you
FlexBase, Akamai's Global Flexible Working Program, is based on the principles that are helping us create the best workplace in the world. When our colleagues said that flexible working was important to them, we listened. We also know flexible working is important to many of the incredible people considering joining Akamai. FlexBase, gives 95% of employees the choice to work from their home, their office, or both (in the country advertised). This permanent workplace flexibility program is consistent and fair globally, to help us find incredible talent, virtually anywhere. We are happy to discuss working options for this role and encourage you to speak with your recruiter in more detail when you apply.
We power and protect life online, by solving the toughest challenges, together.
At Akamai, we're curious, innovative, collaborative and tenacious. We celebrate diversity of thought and we hold an unwavering belief that we can make a meaningful difference. Our teams use their global perspectives to put customers at the forefront of everything they do, so if you are people-centric, you'll thrive here.
About us
Akamai powers and protects life online. Leading companies worldwide choose Akamai to build, deliver, and secure their digital experiences helping billions of people live, work, and play every day. With the world's most distributed compute platform from cloud to edge we make it easy for customers to develop and run applications, while we keep experiences closer to users and threats farther away.