Lead DevOps Engineer
Permanent
Location: UK- Remote
Salary: £70,000 - £75,000 (+ benefits)
Skills: AWS, Terraform, CI/CD, Production SaaS experience
We are looking to recruit a Lead DevOps Engineer for a leading software company. This is a hands-on technical leadership role, ideal for someone who enjoys owning AWS infrastructure strategy while remaining close to engineering delivery.
You’ll play a key role in shaping platform standards, improving reliability, embedding security best practice, and driving automation across the organisation.
This is a fully remote UK based role.
The Role
Platform Architecture & Cloud Engineering
- Own AWS multi-account infrastructure architecture (secure-by-design)
- Define infrastructure standards across networking, IAM, logging and disaster recovery
- Lead Infrastructure-as-Code strategy (Terraform preferred)
- Ensure scalability, resilience and high availability across production environments
CI/CD & Release Automation
- Design and optimise CI/CD pipelines
- Improve deployment reliability and reduce rollback frequency
- Standardise release processes across engineering teams
- Implement progressive delivery practices
Reliability & Observability
- Define and track SLIs/SLOs
- Enhance monitoring, alerting and incident response processes
- Lead post-incident reviews and root cause analysis
- Drive reduction of operational toil
Security & Compliance
- Embed DevSecOps controls into pipelines
- Implement least-privilege IAM models
- Support ISO 27001 and compliance evidence automation
FinOps & Cost Optimisation
- Partner on cloud cost optimisation strategy
- Improve tagging standards and cost allocation models
- Implement rightsizing and automation policies
About You
- 5+ years’ experience in DevOps / Cloud Engineering
- Strong AWS expertise (VPC, IAM, EC2, RDS, EKS, Lambda)
- Proven Infrastructure-as-Code experience (Terraform preferred)
- CI/CD tooling experience (GitHub Actions, GitLab CI, Jenkins)
- Experience operating production SaaS environments
- Strong observability tooling knowledge (Datadog, Prometheus, ELK etc.)
- Incident management and root cause analysis experience
- Experience in regulated or security-conscious environments is highly desirable