Didn't find the right job?

Get expert career advice to help you find the ideal role and improve your job search strategy.

What Jobs are available for Devops Engineers in the United Kingdom?

Showing 2278 Devops Engineers jobs in the United Kingdom

Site Reliability Engineer

Sutton, London £450 - £550 Daily IntaPeople

Posted today

Tap Again To Close

Job Description

contract

IntaPeople are partnering with a long-standing client in the data and analytics space, an organisation known for its technical excellence, collaborative culture, and meaningful impact across sectors. They’re scaling their SRE function and looking for a seasoned engineer to join a high-performing team delivering internal applications that power critical operations.

This is a 6-month contract (with strong potential to extend), starting ASAP. You’ll be based 2 days a week in either Southampton or Sutton, whichever suits you best. Most of the SREs sit in Southampton, so if you enjoy bouncing ideas off peers in person, that’s the spot.

What you’ll be doing:

Collaborating with cross-functional teams to implement and optimise internal applications
Diagnosing complex system performance issues and refining monitoring/reporting
Championing architectural best practices across engineering teams
Driving innovation and continuous improvement in infrastructure and tooling
Working hands-on with IaaS, CI/CD pipelines, and containerised environments

What you bring:

Deep Linux expertise and fluency in at least one high-level programming language (Python preferred)
Strong experience with AWS (VPCs, EC2, ECS/EKS, RDS, S3, etc.)
Solid understanding of database systems (Postgres, SQL Server)
IaC mastery (Terraform, CloudFormation, Ansible)
Passion for monitoring and observability (Grafana, Elastic, PagerDuty, etc.)
Familiarity with configuration management tools (Puppet, etc.)
Git, Docker, and scripting skills (bash or similar)
A collaborative mindset and the ability to communicate technical concepts clearly

Bonus points for:

Agile delivery experience
Security engineering know-how
Ability to explain complex issues to non-technical audiences
Experience with scalable, cloud-native applications

This is a brilliant opportunity to work with a forward-thinking tech team that values autonomy, innovation, and impact. If you’re an SRE who thrives in fast-paced environments and loves solving complex problems, I’d love to hear from you. Apply now to learn more.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Sutton, London IntaPeople

Posted today

Tap Again To Close

Job Description

contract

What you’ll be doing:

Collaborating with cross-functional teams to implement and optimise internal applications
Diagnosing complex system performance issues and refining monitoring/reporting
Championing architectural best practices across engineering teams
Driving innovation and continuous improvement in infrastructure and tooling
Working hands-on with IaaS, CI/CD pipelines, and containerised environments

What you bring:

Deep Linux expertise and fluency in at least one high-level programming language (Python preferred)
Strong experience with AWS (VPCs, EC2, ECS/EKS, RDS, S3, etc.)
Solid understanding of database systems (Postgres, SQL Server)
IaC mastery (Terraform, CloudFormation, Ansible)
Passion for monitoring and observability (Grafana, Elastic, PagerDuty, etc.)
Familiarity with configuration management tools (Puppet, etc.)
Git, Docker, and scripting skills (bash or similar)
A collaborative mindset and the ability to communicate technical concepts clearly

Bonus points for:

Agile delivery experience
Security engineering know-how
Ability to explain complex issues to non-technical audiences
Experience with scalable, cloud-native applications

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Cambridge, Eastern Speechmatics

Posted today

Tap Again To Close

Job Description

The Role

Speechmatics are seeking a Site Reliability Engineer (SRE) whose focus will be improving the reliability of our products, systems and infrastructure. You will work across teams to improve availability, scalability, performance and efficiency of our real-time AI inference APIs.

You will get to work with high-scale GPU deployments spread across the world. Our customers expect low-latency responses, making this is a really interesting problem space to learn about.

What you'll be doing:

Working with a diverse group of engineers across Speechmatics to improve reliability of our products and systems, from design through to operation in production.
Taking part in incident response, postmortems and ensuring the same incident doesn't happen twice.
Managing and improving GitOps release workflows and CI/CD pipelines.
Monitoring system performance and troubleshooting production environments.
Implementing observability improvements using OpenTelemetry tooling.
Automating processes that reduces manual efforts and creates self-healing systems.
Taking part in on-call rota for production systems that has a generous daily pay rate and mentorship programme to build your confidence in doing it.

Who we are looking for:

Comfortable navigating and troubleshooting Linux systems directly from the command line.
Hands-on experience with major cloud platforms such as AWS, Azure, and GCP.
Skilled in managing containerised applications using tools like Docker, Kubernetes, and Helm.
Proficient in Infrastructure-as-Code practices — our production stack includes Terraform, Datadog, ArgoCD, and GitLab.
Strong focus on automation; you streamline workflows with Bash scripts and turn to Python when things get more complex.
Curious about the entire technology landscape — from bare-metal servers to cloud abstractions — and motivated to understand how each layer fits together.
Naturally inquisitive and eager to dive deep into new technologies; you thrive on learning as you go.
Prior experience with on-call rotations and incident response is a plus.
Familiarity with OpenTelemetry and related observability tooling is advantageous.

We encourage you to apply even if you do not feel you match all of the requirements exactly. The list of requirements is intended to show the kinds of experience and qualities we're looking for, but it is not exhaustive. If you are interested in the role, the team, and our mission, we would love to consider your application. We are always open to conversations and look forward to hearing from you.

Who we are:

Speechmatics is the leading expert in Speech Intelligence, and uses AI and Machine Learning to unlock business value in human speech worldwide. We work with an amazing mix of global companies, and our technology can integrate into our customers stack irrespective of their industry or use case – making it the go-to solution to harness useful information from speech.

Joining us means working with some of the smartest minds around the world, focused on cutting-edge projects and deploying the latest techniques to disrupt the market. We believe in putting people first; we'll do all we can to help you develop your skills and give you the tools you need to thrive. Our Focus Fridays give you an undisturbed day of focus, offset with Together Tuesdays when we have our team meetings, so you've always got the right balance.

We have structured a hybrid approach that includes 2-3 designated office days each week. This arrangement ensures that while we embrace the advantages of remote work, we also maintain the vital connection and synergy that only in-person interactions can foster.

This is only the beginning; we're looking for amazing people like you to continue our journey…

What we can offer you:

No matter what stage of your career you're at - from paid internships and first-job opportunities through to management and senior positions - we'll support you with the training and development needed to reach your career aspirations with us. There really is no shortage of opportunities here for you to get involved and collaborate with those around you to deliver your best work.

We offer incredibly flexible working, regular company lunches, and birthday celebrations. But that's not all. We've spoken to our teams to find out what they want. From Private Medical, and Dental for you and your family, through to global working opportunities, a generous holiday allowance and pension/401K matching, we want to make sure our employees and their families are looked after. Every employee will receive a working from home allowance for tech or home office equipment (on top of your choice of laptop and accessories of course). Our approach to parental leave is designed to support employees globally. While this varies by geo, we have support in place for parents (including adoption assistance and reproductive health services) to ensure they have the time and financial resources needed to care for their growing families.

At Speechmatics, our mission is simple: Understand Every Voice out there.

That's not just about our tech – it's the heart and soul of who we are. We welcome different experiences, viewpoints, and identities. For us, it's not just the right thing to do; it's our catalyst for sparking innovation and creativity. Our teams thrive in an environment that celebrates and supports everyone – no matter their gender, identity or expression, race, disability, age, sexual orientation, religion, belief, marital status, national origin, veteran status, pregnancy, or maternity status.

But we don't just open the door to diversity – we actively welcome it. Why? Because we believe every unique voice adds something special to our team, leading us to smarter solutions and a better workplace.

So, come as you are and join our Speechling community. We're building a place where every voice not only gets heard but is also respected and valued.

For more information on us, please visit our website and follow Speechmatics on our social channels via Twitter, Facebook, LinkedIn, and YouTube.

We rely on legitimate interest as a legal basis for processing personal information under the GDPR for purposes of recruitment and applications for employment.
#LI-Hybrid

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

C3 AI

Posted today

Tap Again To Close

Job Description

C3 AI (NYSE: AI), is the Enterprise AI application software company. C3 AI delivers a family of fully integrated products including the C3 Agentic AI Platform, an end-to-end platform for developing, deploying, and operating enterprise AI applications, C3 AI applications, a portfolio of industry-specific SaaS enterprise AI applications that enable the digital transformation of organizations globally, and C3 Generative AI, a suite of domain-specific generative AI offerings for the enterprise. Learn more at:
C3 AI
We are looking for a
Site Reliability Engineer
to join our team in London.

Responsibilities

Maximize system uptime and availability, ensuring functional and performance SLAs.
Establish end-to-end monitoring and alerting on all critical aspects.
Solve complex problems for critical services and build automation to prevent problem recurrence.
Influence and create new designs, architectures, standards, and methods for supporting the platform.
Initiate and lead scripting and automation to streamline system updates and upgrades.
Set up critical infrastructure, tools, and framework to streamline the deployment cycle.
Work cross-functionally with Services and Engineering teams.

Qualifications

Demonstrated experience in deploying, managing, and operating scalable and fault-tolerant Linux/Kubernetes/JVM-based infrastructure in AWS, GCP, and other public clouds.
Expertise in Linux Operating Systems, Networking, and Database concepts.
Experience with Cassandra (or another NoSQL alternative).
Expertise in cloud providers, such as Amazon Web Services, Azure, and GCP.
Experience with configuration management systems such as Ansible or Puppet.
Experience in Ruby or Python; to automate and monitor systems.
Excellent problem-solving, critical thinking, and communication skills.
Experience supporting as a DevOps or sys admin for commercial SaaS solutions.
BS or MS in Computer Science, related field, or equivalent professional experience.

C3 AI provides excellent benefits and a competitive compensation package.
C3 AI is proud to be an Equal Opportunity and Affirmative Action Employer. We do not discriminate on the basis of any legally protected characteristics, including disabled and veteran status.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

BAE Systems Digital Intelligence

Posted today

Tap Again To Close

Job Description

Location(s): ((mfield3))
BAE Systems Digital Intelligence is home to 4,500 digital, cyber and intelligence experts. We work collaboratively across 10 countries to collect, connect and understand complex data, so that governments, nation states, armed forces and commercial businesses can unlock digital advantage in the most demanding environments.

Site Reliability Engineering is a rapidly growing concept in industry, with a remit to drive the quality, reliability and performance of essential systems. As a Site Reliability Engineer you'll be part of a team in BAE Systems at the forefront of this, delivering these benefits to a key national security customer. We are in the process of building our team and tools, and with your help will create a culture of continual improvement to revolutionise the way our customer's systems are built and maintained. This role blends operational product support with software engineering to create applications to understand the overall health of our systems. The SRE team sits within a wider programme at the core of the customer mission.

The Role Holder
As an SRE, fundamentally you will be doing work that has historically been done by an operations team, but using software and systems engineering expertise to substitute automation for human labour, with the objective of limiting traditional manual operations work (incident tickets, on-call etc.) to no more than half of the SRE team's time (and aiming for considerably less). You will have an enthusiasm to learn and experiment, to develop tools to understand application health and improve their reliability to support the customer mission.

Role Accountabilities Include
 Supporting and maintaining essential service that support core mission applications, proactively enhancing their availability, performance and stability.

Being part of the 24/7 on call rota, supporting critical production systems out of business hours, for which additional on call allowances and overtime benefits will be paid.

inding innovative solutions to problems rather than undertaking repetitive work, automating everything you can. You will work alongside development teams, advising them of good practice in how to design and build systems, learning from what you know works well.

ou will design and deploy monitoring products, creating bespoke tools where required, to provide comprehensive and intelligent observations to meet the customer requirements and demonstrate the improvements the team are making on a daily basis. You will be well versed in the relationship between software and infrastructure, understanding the characteristics of systems that enable them to be scalable and resilient to failure, and how to get the best out of the infrastructure they are deployed to.

articipating in the wider DevOps/SRE community within the organisation.

Competancies
t is desirable for you to have experience in the areas below. However more valued for this role is that you have excitement and enthusiasm to learn new technologies, and to deal with hard problems. Training, knowledge sharing and on the job development will enable you to plug any knowledge gaps.

Software development in web technologies and object oriented programming
Database technologies such as Oracle SQL, Mongo, Postgres
Know your way around Linux and Windows command lines, e.g. Bash and PowerShell
Monitoring large systems using technologies such as Grafana, Prometheus, ELK, Splunk
Experience of working in Agile teams, and the tooling that supports it, e.g. Atlassian
Diagnosing and troubleshooting application issues resulting in service outages
Troubleshooting skills across different levels of the stack
Understanding of ITIL
Micro-services architectures, Docker and container platforms such as Openshift, Kubernetes

wareness and insight into technology trends to adopt new cutting edge tools

Security Clearance
Due to the nature of our work, successful candidates for this role will be required to hold an active eDV before applying for this opportunity.

Life at BAE Systems Digital Intelligence
We are embracing Hybrid Working. This means you and your colleagues may be working in different locations, such as from home, another BAE Systems office or client site, some or all of the time, and work might be going on at different times of the day.

By embracing technology, we can interact, collaborate and create together, even when we're working remotely from one another. Hybrid Working allows for increased flexibility in when and where we work, helping us to balance our work and personal life more effectively, and enhance well-being.

Diversity and inclusion are integral to the success of BAE Systems Digital Intelligence. We are proud to have an organisational culture where employees with varying perspectives, skills, life experiences and backgrounds – the best and brightest minds – can work together to achieve excellence and realise individual and organisational potential.

Division overview: Capabilities
At BAE Systems Digital Intelligence, we pride ourselves in being a leader in the cyber defence industry, and Capabilities is the engine that keeps the business moving forward. It is the largest area of Digital Intelligence, containing our Engineering, Consulting and Project Management teams that design and implement the defence solutions and digital transformation projects that make us a globally recognised brand in both the public and private sector.

As a member of the Capabilities team, you will be creating and managing the solutions that earn us our place in an ever changing digital world. We all have a role to play in defending our clients, and this is yours.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Mercor

Posted today

Tap Again To Close

Job Description

Company Introduction
Mercor
connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include
Benchmark
,
General Catalyst
,
Peter Thiel
,
Adam D'Angelo
,
Larry Summers
, and
Jack Dorsey
.

Role Overview

Position: Site Reliability Engineer (SRE) – Full-Time, San Francisco
Commitment: 40 hours per week
As an SRE at Mercor, you'll build and automate systems to keep our platform reliable, scalable, and fast. You will work across every layer of the stack to drive measurable reliability improvements.

Responsibilities

Mentor engineers on best practices for observability, alert management, and instrumentation.
Lead incident response from triage through post-mortem and remediation.
Own and improve load-testing, disaster-recovery, and chaos-engineering programs.
Automate reliability checks, capacity planning, and service-level monitoring.
Partner with product and platform teams to design for reliability and scalability from the start.

Requirements / Qualifications
Must-Have Qualifications

Background in SRE
Proficiency in Terraform, Python, Go
Experience working with AWS

Preferred Qualifications

Experience with RDBMS (MySQL)
Experience with document storage systems (MongoDB)
Experience with caching systems (Redis)
Exposure to data warehousing (Snowflake)
Previous work in a high-growth startup environment

Engagement Details

Full-Time position
Location: San Francisco
Remote work flexibility
Competitive compensation

Application Process (Takes 20-30 mins to complete)

Upload resume
AI interview based on your resume
Submit form

Resources & Support

For details about the interview process and platform information, please check:
For any help or support, reach out to:

PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.
,

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Burgess Hill, South East HCLTech

Posted today

Tap Again To Close

Job Description

HCLTech is a global technology company, home to more than 220,000 people across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. Consolidated revenues as of 12 months ending December 2024 totaled $13.8 billion.

SRE for production support of mission critical tokenization platform. Candidate should be strong in ITSM process and hands-on with automation scripting and cloud technologies.

Good to have proficiency with:

Programming - Java, vert x, Python, Shell Scripting, GO, REST
SRE - Kubernetes, Splunk/ELF, Openshift, CI/CD
DB Postgres/Couchbase/Oracle

Technical Skill

Managing production support for mission-critical platforms

Implementing and following ITSM processes for incident handling

Writing automation scripts using Shell, Python , or Go

Deploying and managing Kubernetes clusters in production

Operating and troubleshooting OpenShift environments

Building and maintaining CI/CD pipelines for cloud-native apps

Monitoring and alerting using Splunk or ELF

Querying and tuning using Postgres or Oracle databases

Developing and debugging REST APIs for platform integration

Supporting Java and Vert.x based microservices in production

Managing Couchbase clusters and optimizing performance

Monitoring and resolving issues in Postgres/ Oracle databases

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Be The First To Know

About the latest Devops engineers Jobs in United Kingdom !

Set Email Alert:

Enter your email

Job title

Location

Site Reliability Engineer

Arrows

Posted 11 days ago

Tap Again To Close

Job Description

Site Reliability Engineer (Lead Level) | London | Up to £600 Inside IR35 | Hybrid (2 Days Onsite) | 6 months

I’m partnered with a major media and tech company looking for a Lead Site Reliability Engineer to support and scale their Video on Demand (VOD) infrastructure. You’ll work across modern tech stacks including AWS, GCP, Cassandra, and Kafka, helping deliver reliable, high-performance systems used by millions.

What you’ll do

Lead project delivery while supporting day-to-day operations and incident management
Build and manage infrastructure as code to improve reliability, scalability, and performance
Design and implement new architectures and best practices for infrastructure and delivery
Drive automation across monitoring, CI/CD, and deployment pipelines
Mentor engineers and guide technical decisions within a fast-paced, cross-functional environment

What you’ll bring

Strong Linux administration skills (Ubuntu preferred)
Hands-on experience with AWS and GCP
Proficiency in Terraform, Ansible, Jenkins, or GitLab CI
Knowledge of Kafka, Cassandra, and relational or NoSQL databases
Scripting skills in Python, Bash, Go, or Java
Familiarity with monitoring tools like Prometheus, Nagios, or Icinga
Understanding of networking fundamentals and virtualisation (e.g. VMware)
Comfortable with on-call rotations and troubleshooting in live environments

Up to £600 per day (Inside IR35)

London | Hybrid (2 days onsite)

6-month contract, with strong potential to extend

If you’re an experienced SRE who enjoys taking ownership, leading technical delivery, and working on large-scale content platforms, I’d love to chat.

Apply or message me if you’d like to hear more.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

London, London Arrows

Posted 11 days ago

Tap Again To Close

Job Description

Site Reliability Engineer (Lead Level) | London | Up to £600 Inside IR35 | Hybrid (2 Days Onsite) | 6 months

What you’ll do

Lead project delivery while supporting day-to-day operations and incident management
Build and manage infrastructure as code to improve reliability, scalability, and performance
Design and implement new architectures and best practices for infrastructure and delivery
Drive automation across monitoring, CI/CD, and deployment pipelines
Mentor engineers and guide technical decisions within a fast-paced, cross-functional environment

What you’ll bring

Strong Linux administration skills (Ubuntu preferred)
Hands-on experience with AWS and GCP
Proficiency in Terraform, Ansible, Jenkins, or GitLab CI
Knowledge of Kafka, Cassandra, and relational or NoSQL databases
Scripting skills in Python, Bash, Go, or Java
Familiarity with monitoring tools like Prometheus, Nagios, or Icinga
Understanding of networking fundamentals and virtualisation (e.g. VMware)
Comfortable with on-call rotations and troubleshooting in live environments

Up to £600 per day (Inside IR35)

London | Hybrid (2 days onsite)

6-month contract, with strong potential to extend

If you’re an experienced SRE who enjoys taking ownership, leading technical delivery, and working on large-scale content platforms, I’d love to chat.

Apply or message me if you’d like to hear more.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

London, London WALT Labs

Posted 14 days ago

Tap Again To Close

Job Description

Company Description

WALT Labs, a leading managed service provider, is dedicated to empowering businesses by harnessing the power of cloud technology. Our team specializes in delivering customized solutions tailored to meet the unique needs of our clients, driving growth and operational efficiency across industries. From supporting small businesses with seamless data migration to enabling large corporations to manage complex infrastructure projects, we provide exceptional service while staying at the forefront of cloud technology advancements.

Role Description

This is a full-time on-site role 3 days a week minimum in Kings Cross London. We are seeking a skilled Site Reliability Engineer with a strong focus on Google Cloud Platform (GCP) to join our dynamic team. In this role, you’ll be responsible for maintaining cloud infrastructure, managing incidents, and ensuring seamless operations for our clients. You’ll use tools like incident.io and JIRA to manage and resolve support requests efficiently.

Qualifications

8-10 years of experience managing applications and infrastructure performance.
Proven experience with Google Cloud Platform (GCP) services.
Familiarity with incident.io for incident tracking and management (of equivalent)
Proficiency in using JIRA for task management and support workflows.
Strong experience working with observability tools (Grafana)
Strong troubleshooting and problem-solving skills in cloud environments.
Understanding of cloud security and performance optimisation best practices.
Knowledge of scripting or automation tools (e.g., Python, Terraform) is a plus.
Excellent communication and customer service skills.
Certifications in GCP (Professional certifications) are highly desirable.
Ability to work under pressure and prioritise tasks effectively.
Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent experience).

Responsibilities

Provide technical support and resolve issues related to Google Cloud Platform (GCP) services and AWS.
Manage and respond to cloud incidents using incident.io, ensuring timely resolution.
Use JIRA to log, track, and prioritize support tickets and workflow tasks.
Monitor and maintain cloud infrastructure for performance, reliability, and security.
Collaborate with teams to identify and implement solutions to technical challenges.
Assist in deploying, configuring, and optimising GCP resources.
Create and maintain documentation for troubleshooting processes and best practices.
Proactively identify opportunities to improve cloud environments and support processes.
Support clients and stakeholders by providing clear communication and updates during incident resolution.
Stay up-to-date with the latest GCP developments and contribute to team knowledge sharing.

Benefits

20 holiday days + bank holidays (earn 1.5 days every 3 years)
Private health insurance

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Industry

View All Devops Engineers Jobs

Search Suggestions

Recent Searches

Popular Searches

Location Suggestions

Popular Locations

Nearby Locations

Other Jobs Near Me

Industry

What Jobs are available for Devops Engineers in the United Kingdom?

Site Reliability Engineer

Job Description

Is this job a match or a miss?

Site Reliability Engineer

Job Description

Is this job a match or a miss?

Site Reliability Engineer

Job Description

Is this job a match or a miss?

Site Reliability Engineer

Job Description

Is this job a match or a miss?

Site Reliability Engineer

Job Description

Is this job a match or a miss?

Site Reliability Engineer

Job Description

Is this job a match or a miss?

Site Reliability Engineer

Job Description

Is this job a match or a miss?

Be The First To Know

Site Reliability Engineer

Job Description

Is this job a match or a miss?

Site Reliability Engineer

Job Description

Is this job a match or a miss?

Site Reliability Engineer

Job Description

Is this job a match or a miss?

Nearby Locations

Other Jobs Near Me

Industry