Job Description: ML Ops Engineer
Position Overview
The ML Ops Engineer will design and operate the production backbone for Southern Company’s AI Hub, ensuring AI and machine learning systems are deployed, monitored, and governed at scale. This role drives the enterprise-wide MLOps framework—establishing standards, lifecycle governance, and observability—while delivering secure, resilient production services and reusable AI products that accelerate innovation across operating companies. Success requires balancing rapid iteration with the reliability, safety, and compliance expected of a critical infrastructure enterprise.
Key Responsibilities
Operationalize AI and agentic systems. Build and maintain CI/CD pipelines for models, prompts, tools, and multi-agent workflows, enabling consistent promotion from experimentation to production.
Implement AI observability and reliability. Establish monitoring for agent behavior, model performance, drift, cost, and safety outcomes using logs, traces, metrics, and evaluators.
Enforce governance through automation. Embed guardrails, approvals, and policy-as-code into deployment pipelines, enabling compliant AI delivery without manual bottlenecks.
Manage model and agent lifecycle. Own versioning, rollout strategies (canary, shadow, rollback), and decommissioning for models, agents, and supporting tools.
Ensure platform resilience and scalability. Design runtime patterns that meet availability, latency, and fail-safe requirements, including degraded-mode and read-only behaviors for sensitive use cases.
Support multi-vendor and multi-cloud execution. Enable portable deployments across hyperscalers and model providers, minimizing lock-in while maintaining consistent operational controls.
Partner with engineering and data teams. Work closely with AI Architects, data engineers, and product squads to resolve production issues and continuously improve developer experience
Qualifications
Educational Background: Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or related field.
Experience: Proven experience (5+ years) in cloud engineering or Dev Ops with 2+ years in MLOps or AI infrastructure, Data Engineering, ML Engineering, or similar role.
Domain Expertise
Experience operating machine learning and AI systems in regulated or mission-critical environments.
Strong understanding of ML lifecycle management, including experimentation, validation, deployment, monitoring, and retirement.
Familiarity with agentic AI runtime patterns, including orchestration, tool execution, and human-in the-loop controls.
Knowledge of enterprise AI governance, observability, and maturity models Manage model and agent lifecycle.
Individual Skills
Operational mindset with strong ownership and bias toward reliability and automation.
Ability to troubleshoot complex, distributed AI systems under production constraints.
Clear communicator who can translate operational risks into actionable improvements.
Continuous improvement orientation, balancing speed, safety, and cost.
Technical Expertise
Hands-on expertise with CI/CD and MLOps tooling (e.g., GitHub Actions, Azure DevOps, Terraform).
Experience deploying and operating LLMs, agents, and inference services using containers and orchestration platforms (e.g., Kubernetes).
Proficiency in observability stacks for AI systems (logging, tracing, metrics, evaluation pipelines).
Strong grounding in cloud security and identity, including secrets management, network isolation, and least-privilege access.
Experience with enterprise model registries, feature stores, vector databases, and automated testing for AI workflows.
Deep expertise in Python. Experience with machine learning frameworks and libraries like PyTorch, or scikit-learn.
Experience with ML lifecycle tools like MLflow.
Cloud Platforms: Experience with cloud computing services (Azure and GCP preferred) and their machine learning tools.
Preferred Qualifications
Certifications: Relevant certifications in AI, ML, or data engineering.
Industry Experience: Experience in the energy sector is a plus.
Experience in multi-cloud environment is a plus
Experience designing reusable AI products, agents, and services in a multi-business environment
About Southern Company
Southern Company (NYSE: SO ) is a leading energy provider serving 9 million customers across the Southeast and beyond through its family of companies. Providing clean, safe, reliable and affordable energy with excellent service is our mission. The company has electric operating companies in three states, natural gas distribution companies in four states, a competitive generation company, a leading distributed energy solutions provider with national capabilities, a fiber optics network and telecommunications services. Through an industry-leading commitment to innovation, resilience and sustainability, we are taking action to meet customers' and communities' needs while advancing our goal of net-zero greenhouse gas emissions by 2050. Our uncompromising values ensure we put the needs of those we serve at the center of everything we do and are the key to our sustained success. We are transforming energy into economic, environmental and social progress for tomorrow. Our corporate culture has been recognized by a variety of organizations, earning the company awards and recognitions that reflect Our Values and dedication to service. To learn more, visit www.southerncompany.com .
Southern Company invests in the well-being of its employees and their families through a comprehensive total rewards strategy that includes competitive base salary, annual incentive awards for eligible employees and health, welfare and retirement benefits designed to support physical, financial, and emotional/social well-being. This position may also be eligible for additional compensation, such as an incentive program, with the amount of any bonus/awards subject to the terms and conditions of the applicable incentive plan(s). A summary of the benefits offered for this position can be found here https://seo.nlx.org/southernco/pdf/SOCO-Benefits.pdf . Additional and specific details about total compensation and benefits will also be provided during the hiring process.
Southern Company is an equal opportunity employer where an applicant's qualifications are considered without regard to race, color, religion, sex, national origin, age, disability, veteran status, genetic information, sexual orientation, gender identity or expression, or any other basis prohibited by law.
Job Identification: 15047
Job Category: Information Technology
Job Schedule: Full time
Company: Southern Company Services