Do you like collaborating across teams to solve complex problems?
Do you enjoy solving large scale distributed systems problems?
Join the Mapping SRE team!
The Mapping SRE team is responsible for overseeing and improving availability, reliability, performance and change management procedures of Akamai's mapping system. Our system routes trillions of client requests per day, controlling tens of terabits per second of content traffic served to clients worldwide. Our team defines KPIs, advances the state of measurements, monitoring dashboards, alerts, and investigates complex production issues.
Partner with the best
In this role, you'll work closely with cross-functional teams to understand and improve the performance, availability and reliability of Akamai's Mapping Service. You'll define key performance indicators (KPIs), advance the state of monitoring, alerting and operational responses, and investigate complex performance issues.
As a Senior Site Reliability Engineer, you will be responsible for:
- Monitoring, investigating, and analyzing performance and availability by (co)designing, managing, and tracking product-related SLIs/SLOs
- Solving problems and avoid recurrence by developing tools / prototypes to proactively monitor service performance and availability
- Working closely with product engineers to advocate reliable and scalable system design for supportability, resilience and reliability
- Leveraging skills in data analysis, network diagnostics and debugging tools to characterize performance and recommend improvements
- Engaging with our support, operations and engineering teams to investigate and troubleshoot complex problems, including incident management and post-mortem reviews
- Collaborating with internal teams to help trouble-shoot and resolve escalations and incidents for our customers
Do what you love
To be successful in this role you will:
- Have 5 years of relevant experience and a Master's degree in Computer Science or its equivalent
- Demonstrate experience in one of the scripting or procedural languages (python, perl, shell, C/C++, Java, etc.)
- Possess experience working in a UNIX/Linux computing environment
- Have experience with monitoring, alerting, and logging platforms such as Grafana.
- Have in-depth understanding of computer networking concepts, Unix/Linux internals, distribution systems, and system design.
- Have excellent communication and organizational skills, be able to articulate technical information in an easy to understand manner
Work in a way that works for you
FlexBase, Akamai's Global Flexible Working Program, is based on the principles that are helping us create the best workplace in the world. When our colleagues said that flexible working was important to them, we listened. We also know flexible working is important to many of the incredible people considering joining Akamai. FlexBase, gives 95% of employees the choice to work from their home, their office, or both (in the country advertised). This permanent workplace flexibility program is consistent and fair globally, to help us find incredible talent, virtually anywhere. We are happy to discuss working options for this role and encourage you to speak with your recruiter in more detail when you apply.