104 Site Reliability Engineer jobs in the United Kingdom
Site Reliability Engineer
Posted 7 days ago
Job Viewed
Job Description
UKIC DV Cleared Site Reliability / DevOp Engineer
London - 5 Days Onsite
Up to 550 per day (Umbrella, Inside IR35)
12-Month Contract
Must hold UKIC DV Clearance
Are you passionate about reliability, automation, and supporting mission-critical systems? Join this global defence organisation as a Site Reliability Engineer (SRE) and help shape the future of one of the UK's most vital national security platforms.
You'll be joining a growing SRE team at the heart of the customer's mission, focused on ensuring performance, availability, and scalability-while driving continuous improvement and innovation.
About the Role
As an SRE, you'll combine your operational expertise with software engineering skills to minimise manual effort and drive automation across complex systems. This role is perfect for someone who thrives on solving hard problems, automating the mundane, and building intelligent tools to enhance system reliability.
Key Responsibilities
- Support and maintain essential services behind critical applications.
- Participate in a 24/7 on-call rota (1 week in 5), with extra allowance and overtime.
- Proactively enhance system availability, performance, and resilience.
- Develop tools and solutions to automate repetitive tasks and reduce operational toil.
- Collaborate with development teams to embed best practices and SRE principles.
- Deploy and manage monitoring systems to provide intelligent observability.
- Engage with the wider DevOps/SRE community within the organisation.
Ideal Skills & Experience
We're more interested in your curiosity, enthusiasm, and problem-solving ability than ticking every box. However, experience in any of the following areas would be advantageous:
- Software development in web technologies or OOP (e.g., Python, Java, etc.)
- Database tech: Oracle SQL, PostgreSQL, MongoDB
- Proficient with Linux/Windows command line (Bash, PowerShell)
- Monitoring: Grafana, Prometheus, ELK, Splunk
- Agile working and tooling (e.g., Jira, Confluence)
- Diagnosing and resolving complex system issues
- ITIL knowledge or exposure to IT service operations
- Containerisation: Docker, Kubernetes, OpenShift
- Awareness of modern tech trends and tooling
Security Requirements
UKIC DV clearance holder only
Why Apply?
- Join a forward-thinking SRE team in an environment where your work directly supports UK national security.
- Help shape tooling, practices, and culture from the ground up.
- Work alongside brilliant minds on meaningful problems.
- Receive ongoing training and professional development.
If you're excited about automation, resilient systems, and the opportunity to work on a high-impact project-this is your chance to make a difference.
Site Reliability Engineer
Posted 2 days ago
Job Viewed
Job Description
UKIC DV Cleared Site Reliability / DevOp Engineer
London - 5 Days Onsite
Up to 550 per day (Umbrella, Inside IR35)
12-Month Contract
Must hold UKIC DV Clearance
Are you passionate about reliability, automation, and supporting mission-critical systems? Join this global defence organisation as a Site Reliability Engineer (SRE) and help shape the future of one of the UK's most vital national security platforms.
You'll be joining a growing SRE team at the heart of the customer's mission, focused on ensuring performance, availability, and scalability-while driving continuous improvement and innovation.
About the Role
As an SRE, you'll combine your operational expertise with software engineering skills to minimise manual effort and drive automation across complex systems. This role is perfect for someone who thrives on solving hard problems, automating the mundane, and building intelligent tools to enhance system reliability.
Key Responsibilities
- Support and maintain essential services behind critical applications.
- Participate in a 24/7 on-call rota (1 week in 5), with extra allowance and overtime.
- Proactively enhance system availability, performance, and resilience.
- Develop tools and solutions to automate repetitive tasks and reduce operational toil.
- Collaborate with development teams to embed best practices and SRE principles.
- Deploy and manage monitoring systems to provide intelligent observability.
- Engage with the wider DevOps/SRE community within the organisation.
Ideal Skills & Experience
We're more interested in your curiosity, enthusiasm, and problem-solving ability than ticking every box. However, experience in any of the following areas would be advantageous:
- Software development in web technologies or OOP (e.g., Python, Java, etc.)
- Database tech: Oracle SQL, PostgreSQL, MongoDB
- Proficient with Linux/Windows command line (Bash, PowerShell)
- Monitoring: Grafana, Prometheus, ELK, Splunk
- Agile working and tooling (e.g., Jira, Confluence)
- Diagnosing and resolving complex system issues
- ITIL knowledge or exposure to IT service operations
- Containerisation: Docker, Kubernetes, OpenShift
- Awareness of modern tech trends and tooling
Security Requirements
UKIC DV clearance holder only
Why Apply?
- Join a forward-thinking SRE team in an environment where your work directly supports UK national security.
- Help shape tooling, practices, and culture from the ground up.
- Work alongside brilliant minds on meaningful problems.
- Receive ongoing training and professional development.
If you're excited about automation, resilient systems, and the opportunity to work on a high-impact project-this is your chance to make a difference.
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Site Reliability Engineer
UK (Remote)
6 Month Contract
An excellent contract opportunity for a skilled Site Reliability Engineer to join a forward-thinking technology company. This is a trusted, high-impact engineering environment where a Site Reliability Engineer will be given real ownership, solving complex problems and building systems that directly accelerate software delivery at scale.
As a Site .
WHJS1_UKTJ
Site Reliability Engineer
Posted 2 days ago
Job Viewed
Job Description
Site Reliability Engineer (Contract - Outside IR35)
Location: Onsite 5 days a week in Leeds, West Yorkshire
Contract Type: 6 months (initial) - Outside IR35
We re looking for an experienced Site Reliability Engineer to join on a fast-paced contract, driving down technical debt and delivering urgent, high-impact fixes across a mixed environment of Linux, Windows, and Apache-based systems .
This isn't a greenfield build. Its a high-urgency, hands-on role where you'll patch vulnerabilities, remediate Legacy issues, and stabilise infrastructure and applications, owning the end-to-end delivery of fixes with minimal oversight.
What you'll Be Doing
- Patching & hardening: Triage vulnerabilities across Servers, apps, and services, applying fixes safely and quickly.
- Technical debt remediation: Refactor brittle components, clean up Legacy systems, and stabilise critical services.
- Full life cycle fixes: Take issues from diagnosis through to implementation, testing, and release.
- Operate at pace: Balance urgency with safety to deliver resilient solutions in production environments.
5+ years engineering experience - Back End, DevOps, SRE, or similar.
Proven record in security patching, hotfixes, and incident remediation.
Strong Scripting & coding (Python, Bash, PowerShell; plus exposure to Java/C# or similar Back End languages).
CI/CD fluency - experience building and maintaining pipelines (Jenkins, GitLab, GitHub Actions, Azure DevOps).
Infrastructure expertise - Linux (Servers/hosting), Windows (enterprise/user-facing apps), Apache/web services.
Automation-first mindset - comfortable with IaC (Terraform, Ansible), containerisation (Docker, Kubernetes), and monitoring tools.
Able to work independently , prioritise rapidly, and ship safely under pressure.
If you re an experienced Site Reliability Engineer with a track record in patching, remediation, and technical debt reduction, we'd love to hear from you.
Apply now with your CV and We'll follow up.
Site Reliability Engineer
Posted 10 days ago
Job Viewed
Job Description
Site Reliability Engineer
Work From Home (WFH) + Quarterly Visits to Bath
Full Time, Initial 12 Month Fixed Term Contract
Salary DOE ( ? £45k - ?£60k) + Benefits + Bonus
Deerfoot Recruitment is working with an established FCA-authorised outsourced service provider in the financial services sector, seeking a talented Site Reliability Engineer to join their IT Operations team. This role offers th.
WHJS1_UKTJ
Site Reliability Engineer
Posted 10 days ago
Job Viewed
Job Description
Site Reliability Engineer | £65,000–£95,000 DOE | Hybrid (Bristol-based, occasional site visits)
Clearance: Must be eligible for DV Clearance
Founded in 2019 by engineers solving complex cross-domain problems for government organisations, TwinStream delivers technical excellence and exceptional service to high-profile clients. Our teams work both on-site and remotely, supporting mission-critical sy.
WHJS1_UKTJ
Site Reliability Engineer
Posted 10 days ago
Job Viewed
Job Description
Position Title: Site Reliability Engineer, Interconnection Service and Network Delivery
Location: Hybrid: Austin, Dallas, Boston, Ashburn, Atlanta, London, or Amsterdam
Your role
In this role, you will be responsible for deploying and maintaining all Digital Realty interconnection fabric network infrastructure. The ideal candidate can demonstrate a unique blend of network engineering, network operati.
WHJS1_UKTJ
Be The First To Know
About the latest Site reliability engineer Jobs in United Kingdom !
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Formed in 2014 by a team of proven FinTech entrepreneurs, we are an FCA-regulated business providing global claim funds management and payment solutions. Operating one of the largest banking and payment settlement networks in the world, we give our customers direct access to 200 countries and currencies. Through a single integration, insurers can use this network to pay claims in as fast as 45 seconds and deliver a superior claimant experience. Our market-leading treasury proposition provides insurers with transparency and control over their claim funds, even when delegated to third parties, allowing them to have their money in the right place, at the right time, to make that all-important payment when customers need it most.
With over 260 employees across our London headquarters, Europe, and the US, $93m Series C funding secured, and exceeding £15bn in processed transactions, we are only just getting started.
We are collaborative, customer centric and work with integrity, whilst partnering with some of the biggest insurance leaders including Lloyd's of London and Many Pets. We take huge pride in our company culture, ensuring that everyone has a part to play, an opportunity to be heard, be involved, and the ability to make a real difference. As we continue to scale up, we want like-minded humans to join us on this exciting journey.
Are you ready?
Your mission:
As a Site Reliability Engineer (SRE), you will play an important role in designing, building, and maintaining the infrastructure and tools necessary to support our software applications and services. You will collaborate closely with the product engineering squads, technical operations, and security teams to ensure the reliability, scalability, and security of our platform. Your responsibilities will include automating infrastructure provisioning, configuration management, and deployment pipelines, utilizing best practices and modern technologies to streamline processes and improve efficiency. You will also be responsible for monitoring system performance, identifying bottlenecks, and implementing solutions to enhance system reliability and performance.
Your responsibilities
- Cloud Platform Management: Using Azure/AWS to manage and optimize infrastructure components, ensuring scalability, reliability, and cost management.
- Infrastructure Design and Implementation: Designing, building and maintaining the cloud-based infrastructure that supports our software applications and services
- System Reliability: Ensuring the reliability, availability, and performance of systems and services by designing, implementing, and maintaining robust infrastructure.
- Infrastructure as Code (IaC): Implementing and maintaining tools for automation, monitoring, and deployment to improve efficiency and reduce manual intervention.
- Collaboration and Support: Working closely with product engineering to ensure efficient workflows and support continuous integration and delivery pipelines (CI/CD).
- Capacity Planning and Scalability: Assessing system capacity requirements and planning for future growth to ensure the system can scale and is cost efficient.
- Incident Response and Management: Monitoring system health, promptly responding to incidents, and assisting with the resolution process.
- Risk Management: Identifying potential risks and vulnerabilities in systems and implementing measures to mitigate these risks effectively.
- Monitoring and Observability: Implement and oversee monitoring tools to proactively detect and mitigate issues, ensuring high application and system availability.
- Documentation and Knowledge Sharing: Maintaining documentation and sharing knowledge with the team to ensure transparency and facilitate cross-functional collaboration.
Requirements
- 3+ years of experience in an SRE or Platform/Cloud Engineer, or similar role.
- Strong knowledge and experience in cloud platforms, we primarily host in Azure and AWS but recognize that skills are transferable.
- Experience in running and maintaining highly available and scalable platforms.
- Expertise in containerisation tools like Docker and orchestration tools such as Kubernetes.
- Experience with infrastructure as code (IaC) tools such as Terraform, Ansible, or Chef for automation and configuration management.
- Strong understanding of monitoring and observability tools.
- Knowledge of networking, security principles, and best practices in a cloud environment. Cloudflare experience would be a bonus.
- Demonstrated experience of CI/CD tools like GitHub Actions, GitLab CI/CD, or Azure DevOps for continuous integration and delivery.
- Problem-solving mindset and meticulous attention to detail.
- Strong collaboration and communication skills to work effectively with cross-functional, internationally distributed teams.
- Comfortable working in a fast-paced environment, handling incidents, and participating in on-call rotations.
- Adaptability to evolving technologies and eagerness to learn new tools and methodologies.
Benefits
- 25 days Holiday per year + Bank Holidays
- Hybrid working arrangements.
- Contributory pension scheme
- Enhanced parental leave.
- Cycle to Work Scheme
- Private Medical Insurance through Vitality
- Access to Oliva our Mental Health Therapy partners
- Discounted Gym membership
- Financial Coaching with Octopus Wealth
- 2 days of volunteering leave per year
- Sabbatical after 5 years’ service
- Ongoing Learning and Development to support you reach your career goals.
WE ARE AN EQUAL OPPORTUNITY EMPLOYER
We are committed to creating an inclusive environment that enables everyone to perform at their best, where we recognise the rights of all individuals to mutual respect and where there is an
unbiased acceptance of others. Our policies and practices aim to promote an environment that is free from all forms of Unfair discrimination and values the diversity of all people. At the heart of our policy, we seek to treat people fairly and with dignity and respect.
Site Reliability Engineer
Posted 1 day ago
Job Viewed
Job Description
ABOUT THE COMPANY
Infleqtion is a global quantum technology company solving the world’s most challenging problems. The company harnesses quantum mechanics to build and integrate quantum computers, sensors, and networks. From fundamental physics to leading edge commercial products, Infleqtion enables “quantum everywhere” through our ecosystem of devices and platforms. We are recruiting for a Site Reliability Engineer with DevOps skills for our quantum computing platform.
LOCATION
Infleqtion has offices in the USA, United Kingdom and Australia. This is a full-time position split between our Kidlington, Oxford office and the National Quantum Computing Centre, Harwell. Our flexible working policy enables all full-time employees to work up to 2 days a week from home if work permits. Candidate will need to be able to travel independently to these locations, as required.
POSITION SUMMARY
As part of our strategy for growth at Infleqtion UK, we are expanding our engineering team and recruiting a Site Reliability Engineer with DevOps skills.
For this role, you will bring a mix of technical skills, problem-solving ability and effective communication, playing a pivotal role in operational reliability and code velocity for our quantum research and device development.
This role involves a combination of installation and maintenance of physical on-site hardware, network management, server administration, and proactive involvement in our continuous integration and deployment processes. The successful candidate will ideally possess a solid technical background in both systems engineering and software development.
JOB RESPONSIBILITIES
- Network Management: Design, implement, and manage robust networks, including configuring switches and managing network-connected devices.
- System administration to manage users and ensure systems are kept secure and updated.
- 'System-down' planning and response. Own system management and coordinate response to efficiently maintain high up-times. Plan and implement system backup policies to ensure swift recovery.
- Collaboration and Documentation: Collaborate with both software developers and hardware engineers in a lab environment, documenting processes and system configurations for ongoing projects.
- Software Deployment: Manage Docker containers and orchestrate CI/CD pipelines for efficient software deployment and updates.
- Infrastructure as code: Design, develop, and maintain IaC solutions and implement CI/CD pipelines to automate deployment processes.
Requirements
- Excellent understanding of networking technologies, connecting computers and embedded devices
- Experience of working with an on-premises physical network in a setting such as a date centres, research laboratories, critical infrastructure etc.
- Proficiency with:
- Installing hardware
- Deployment and configuration management framework e.g. Ansible
- Linux systems
- Docker and/or Kubernetes
- Git and version control-centric workflows
- Good collaboration skills, able to work in a team environment where engagement and participation are an expected part of successful job performance.
Experience
- A minimum of 2 years’ experience working with networked Linux devices.
- Proven experience working with complex systems consisting of a number of networked nodes, and including embedded systems (such as raspberry Pi, IoT devices).
- Experience with production grade systems.
- Professional-level verbal and written communication skills, able to effectively share information with technical and non-technical staff.
Qualifications:
- Bachelor’s degree in computer science, engineering, or other related field, (or equivalent), or extensive experience.
- Baseline Personnel Security Standard (BPSS) required (we will arrange for successful candidate).
Desirable:
- Experience with ARTIQ systems, or controlling hardware via python API.
- Python, developing & maintaining code, managing packages and dependencies.
- A background/interest in Quantum Physics/Quantum computing.
Benefits
In addition to your base compensation, we offer a generous Total Rewards program which includes:
- Competitive salary
- Unlimited PTO
- Generous company pension contribution
- Cycle to work and Technology schemes
- BUPA medical insurance upon successful completion of probationary period
- Incentive Stock Option Plan
Site Reliability Engineer
Posted 15 days ago
Job Viewed
Job Description
As a Site Reliability Engineer (SRE) at Trade Nation, you will be part of a dynamic and collaborative team that ensures the reliability, availability, and performance of our web services and applications. You will work closely with developers, operations, and product teams to design, build, and maintain scalable, secure, and efficient systems. You will also monitor, troubleshoot, and resolve issues that affect the user experience and the business objectives.
Who we areWe are Trade Nation. We help our customers power up their trading through killer insights, transparent costs, and fairer ways to trade. We’re innovators, and proud of it. And we’ve grown a lot in our decade as a market-leading low-cost trading powerhouse. Our reach is global through our teams in the UK, Australia, South Africa, Seychelles and The Bahamas.
Founded on transparency, forged in trust and powered by people, we’re committed to empowering our customers to outperform the markets. How? By minimising expenses and harnessing technology to prioritise the lowest trading costs.
But enough about us. Let’s hear about you.
Who you areYou’re something special. You pride yourself on being unique and bringing your own history to the table – finding solutions to daily challenges in a way that can’t be done by anyone else. Maybe you talk a big game, maybe you don’t. The important thing is that you do what you say and follow through to see every customer thrive.
You don’t play with the bumpers up. That means breaking out of your lane when needed to help others – or forging your own completely. Every problem is our problem and that’s how you see it too. Because Trade Nation’s people have a shared vision, and you want to be part of making it a reality.
You know when to take the right sort of risks, the ones that push you to be better. You’re not afraid to try, fail, and then try harder. But don’t worry, you’ll have all the support you need to thrive with us at Trade Nation, and we can’t wait to enable you to learn and grow.
Ready to roll up your sleeves and get stuck in?
Our commitments to each otherWe have each other’s backs
There when we need each other most
We challenge each other
Be more creative, more curious, more bold
We thrive together
Taking our work to the next level
We form strong bonds
Through team building and social events
We don’t judge
Instead, we teach and are open to learning
We step up
Taking ownership and supporting each other to do the same
Responsibilities- Design, implement, and maintain scalable and reliable systems.
- Monitor system performance and troubleshoot issues to ensure high availability and reliability.
- Develop and maintain automation tools to streamline operations and reduce manual intervention.
- Collaborate with development teams to ensure new features and services are designed with reliability in mind.
- Implement and manage monitoring, alerting, and logging systems.
- Conduct root cause analysis of incidents and implement corrective actions to prevent recurrence.
- Participate in on-call rotations to provide support for critical systems.
- Continuously improve system performance, reliability, and scalability through proactive measures and best practices.
Requirements
- A bachelor's degree in computer science, engineering, or a related field, or equivalent work experience.
- At least three years of experience in Site Reliability Engineering, DevOps, or similar roles.
- Proficiency in JavaScript and ideally React.
- Experience with cloud platforms, such as AWS, GCP, or Azure, and related technologies, such as Kubernetes, Docker, Terraform, or CloudFormation.
- Experience with monitoring and observability tools, such as Prometheus, Grafana, Loki, or Sentry.
- Experience with troubleshooting and debugging tools, such as Wireshark, tcpdump, or gdb.
- Strong knowledge of web protocols, such as HTTP, TCP, UDP, DNS, and TLS.
- Strong communication and collaboration skills.
- Passion for learning new technologies and solving complex problems.
Benefits
- Competitive salary, and discretionary annual bonus.
- Private healthcare.
- Life Insurance, Critical Illness & Income Protection cover.
- Active Lifestyle allowance.
- Annual leave above minimum entitlement.
- Cycle to work scheme.
- Up to 3 weeks allowance to work in any location.