46 Site Reliability Engineer jobs in the United Kingdom

Site Reliability Engineer

Bristol, South West Twinstream Limited

Posted 8 days ago

Job Viewed

Tap Again To Close

Job Description

full time

Site Reliability Engineer | £65,000–£5,000 DOE | Hybrid (Bristol-based, occasional site visits)
Clearance: Must be eligible for DV Clearance

Founded in 2019 by engineers solving complex cross-domain problems for government organisations, TwinStream delivers technical excellence and exceptional service to high-profile clients. Our teams work both on-site and remotely, supporting mission-critical systems where performance and reliability are paramount.

The Site Reliability Engineer Role:

We are seeking a Site Reliability Engineer (SRE) to ensure the availability, performance, and cost-effectiveness of our cloud and on-prem services. You will collaborate with software engineers and system administrators to improve observability, reduce downtime, and proactively mitigate reliability risks across a growing portfolio of services.

Key Responsibilities of the Site Reliability Engineer:

  • Improve reliability and performance across multiple subsystems.
  • Automate manual tasks and eliminate unnecessary alerts.
  • Enhance monitoring capabilities to identify and resolve issues before they impact users.
  • Support and optimise CI/CD pipelines and cloud infrastructure.
  • Research and evaluate new tools to influence build-vs-buy decisions.
  • Contribute to technical innovation across diverse stacks and environments.

What We’re Looking For:

  • Experience with configuration management tools (e.g., Ansible, Chef).
  • Proficiency with Terraform, Docker, and container orchestration (Kubernetes, OpenShift, or similar).
  • Hands-on experience with CI/CD tools (Jenkins or equivalent).
  • Knowledge of monitoring solutions (Prometheus, Grafana, InfluxDB).
  • Familiarity with MQ messaging (RabbitMQ or similar).
  • Linux administration, scripting, and network security protocols.
  • Experience with cloud services (preferably AWS – EC2, RDS, S3, Lambda).

Desirable: Experience coding in Java, Go, or Python; cross-domain technologies; observability patterns; and service management environments.

Why Join TwinStream?

  • Salary: £65, 0–£9 000 (DOE & clearance level)
  • Pension: 8% employer contribution
  • Private Healthcare: Includes dental & optical cover for you & your family
  • Learning & Development: ,000 annual training budget
  • Flexible Working: Hybrid & family-friendly culture
  • Additional Perks: EV leasing scheme, 25 days’ holiday + bank holidays, life assurance, cycle-to-work scheme, and team events

Ready to Shape the Future of Mission-Critical Systems?

Apply now and join a team where innovation, reliability, and technical excellence drive everything we do.

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Bristol, South West £65000 - £95000 Annually Twinstream Limited

Posted 8 days ago

Job Viewed

Tap Again To Close

Job Description

permanent

Site Reliability Engineer | £65,000–£5,000 DOE | Hybrid (Bristol-based, occasional site visits)
Clearance: Must be eligible for DV Clearance

Founded in 2019 by engineers solving complex cross-domain problems for government organisations, TwinStream delivers technical excellence and exceptional service to high-profile clients. Our teams work both on-site and remotely, supporting mission-critical systems where performance and reliability are paramount.

The Site Reliability Engineer Role:

We are seeking a Site Reliability Engineer (SRE) to ensure the availability, performance, and cost-effectiveness of our cloud and on-prem services. You will collaborate with software engineers and system administrators to improve observability, reduce downtime, and proactively mitigate reliability risks across a growing portfolio of services.

Key Responsibilities of the Site Reliability Engineer:

  • Improve reliability and performance across multiple subsystems.
  • Automate manual tasks and eliminate unnecessary alerts.
  • Enhance monitoring capabilities to identify and resolve issues before they impact users.
  • Support and optimise CI/CD pipelines and cloud infrastructure.
  • Research and evaluate new tools to influence build-vs-buy decisions.
  • Contribute to technical innovation across diverse stacks and environments.

What We’re Looking For:

  • Experience with configuration management tools (e.g., Ansible, Chef).
  • Proficiency with Terraform, Docker, and container orchestration (Kubernetes, OpenShift, or similar).
  • Hands-on experience with CI/CD tools (Jenkins or equivalent).
  • Knowledge of monitoring solutions (Prometheus, Grafana, InfluxDB).
  • Familiarity with MQ messaging (RabbitMQ or similar).
  • Linux administration, scripting, and network security protocols.
  • Experience with cloud services (preferably AWS – EC2, RDS, S3, Lambda).

Desirable: Experience coding in Java, Go, or Python; cross-domain technologies; observability patterns; and service management environments.

Why Join TwinStream?

  • Salary: £65, 0–£9 000 (DOE & clearance level)
  • Pension: 8% employer contribution
  • Private Healthcare: Includes dental & optical cover for you & your family
  • Learning & Development: ,000 annual training budget
  • Flexible Working: Hybrid & family-friendly culture
  • Additional Perks: EV leasing scheme, 25 days’ holiday + bank holidays, life assurance, cycle-to-work scheme, and team events

Ready to Shape the Future of Mission-Critical Systems?

Apply now and join a team where innovation, reliability, and technical excellence drive everything we do.

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

RELX INC

Posted 5 days ago

Job Viewed

Tap Again To Close

Job Description

Site Reliability Engineer
Are you enthusiastic about designing and managing cloud platforms? Do you find satisfaction in ensuring the reliability and performance of complex systems?
About the Team:
The LexisNexis Intellectual Property (IP) division ( ) provides international patent content and a suite of online and analytic tools that meet the evolving needs of the intellectual property market. We deliver data to support LexisNexis IP search and analytics applications, empowering our customers with actionable insights and metrics for critical business decisions.
Our corporate culture thrives on excellence, innovation, and a strong dedication to our customers, employees, and communities. Working here means joining a vibrant, diverse, and collaborative team where you are free to grow and contribute actively.
About the Role:
We are a high-performing systems engineering team operating in a fast-paced enterprise environment, focused on modernising our infrastructure while upholding strict security and compliance standards. This position provides assistance and input to management, develops and leads large multifunctional development activities, solves complex technical problems, writes complex code for computer systems, and serves as a senior source of expertise.
Requirements:
+ Deep knowledge of cloud services (e.g., EC2, S3, RDS, Lambda, Azure VMs, Azure Functions).
+ Good experience in Cloud Engineering with a strong focus on Azure and/or AWS
+ Experience with Infrastructure as Code (Terraform, ARM/BICEP).
+ Proficiency in containerization and orchestration tools (Docker, Kubernetes/EKS).
+ Skilled in scripting languages (Python, Bash, TypeScript, PowerShell).
+ Strong understanding of Linux/UNIX/Windows systems and storage.
+ Experience with monitoring tools (Datadog, Coralogix, CloudWatch, Azure Monitor).
+ Familiarity with SRE and DevOps practices.
+ Knowledge of networking and security best practices.
+ Excellent problem-solving and stakeholder management skills.
+ Databricks Knowledge is an added advantage.
Responsibilities:
+ Leading Kubernetes deployment and management, including orchestration, architecture, networking, CI/CD, storage, and security.
+ Collaborating with cross-functional teams to design and implement high-quality cloud solutions.
+ Administering and supporting Databricks environments, including permissions, storage, and networking.
+ Troubleshooting complex technical issues using observability tools and root-cause analysis.
+ Implementing infrastructure management best practices and automating repetitive tasks.
+ Supporting program installations, system configurations, and user modifications.
+ Refining system monitoring and reporting in collaboration with support teams.
+ Operating across Agile and Waterfall methodologies to deliver timely solutions.
+ Mentor junior team members and contribute to a culture of continuous learning.
Why Join Us?
Join our team and contribute to a culture of innovation, collaboration, and excellence. If you are ready to advance your career and make a significant impact, we encourage you to apply.
Work in a way that works for you
We promote a healthy work/life balance across the organisation. We offer an appealing working prospect for our people. With numerous wellbeing initiatives, shared parental leave, study assistance and sabbaticals, we will help you meet your immediate responsibilities and your long-term goals.
+ Working flexible hours - flexing the times when you work in the day to help you fit everything in and work when you are the most productive.
Working for you
We know that your well-being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:
+ Annual Profit Share Bonus
+ Comprehensive Pension Plan
+ Home, office or commuting allowance
+ Generous vacation entitlement and option for sabbatical leave
+ Maternity, Paternity, Adoption and Family Care leave
+ Internal communities and networks
+ Recruitment introduction reward
About Our Business
The LexisNexis Intellectual Property (IP) division ( ) provides international patent content and a suite of online and analytic tools that meet the evolving needs of the intellectual property market. We deliver data to support LexisNexis IP search and analytics applications, empowering our customers with actionable insights and metrics for critical business decisions.
We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact .
Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .
Please read our Candidate Privacy Policy .
We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.
USA Job Seekers:
EEO Know Your Rights .
RELX is a global provider of information-based analytics and decision tools for professional and business customers, enabling them to make better decisions, get better results and be more productive.
Our purpose is to benefit society by developing products that help researchers advance scientific knowledge; doctors and nurses improve the lives of patients; lawyers promote the rule of law and achieve justice and fair results for their clients; businesses and governments prevent fraud; consumers access financial services and get fair prices on insurance; and customers learn about markets and complete transactions.
Our purpose guides our actions beyond the products that we develop. It defines us as a company. Every day across RELX our employees are inspired to undertake initiatives that make unique contributions to society and the communities in which we operate.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Farringdon, London RELX INC

Posted 5 days ago

Job Viewed

Tap Again To Close

Job Description

Site Reliability Engineer
Are you enthusiastic about designing and managing cloud platforms? Do you find satisfaction in ensuring the reliability and performance of complex systems?
About the Team:
The LexisNexis Intellectual Property (IP) division ( ) provides international patent content and a suite of online and analytic tools that meet the evolving needs of the intellectual property market. We deliver data to support LexisNexis IP search and analytics applications, empowering our customers with actionable insights and metrics for critical business decisions.
Our corporate culture thrives on excellence, innovation, and a strong dedication to our customers, employees, and communities. Working here means joining a vibrant, diverse, and collaborative team where you are free to grow and contribute actively.
About the Role:
We are a high-performing systems engineering team operating in a fast-paced enterprise environment, focused on modernising our infrastructure while upholding strict security and compliance standards. This position provides assistance and input to management, develops and leads large multifunctional development activities, solves complex technical problems, writes complex code for computer systems, and serves as a senior source of expertise.
Requirements:
+ Deep knowledge of cloud services (e.g., EC2, S3, RDS, Lambda, Azure VMs, Azure Functions).
+ Good experience in Cloud Engineering with a strong focus on Azure and/or AWS
+ Experience with Infrastructure as Code (Terraform, ARM/BICEP).
+ Proficiency in containerization and orchestration tools (Docker, Kubernetes/EKS).
+ Skilled in scripting languages (Python, Bash, TypeScript, PowerShell).
+ Strong understanding of Linux/UNIX/Windows systems and storage.
+ Experience with monitoring tools (Datadog, Coralogix, CloudWatch, Azure Monitor).
+ Familiarity with SRE and DevOps practices.
+ Knowledge of networking and security best practices.
+ Excellent problem-solving and stakeholder management skills.
+ Databricks Knowledge is an added advantage.
Responsibilities:
+ Leading Kubernetes deployment and management, including orchestration, architecture, networking, CI/CD, storage, and security.
+ Collaborating with cross-functional teams to design and implement high-quality cloud solutions.
+ Administering and supporting Databricks environments, including permissions, storage, and networking.
+ Troubleshooting complex technical issues using observability tools and root-cause analysis.
+ Implementing infrastructure management best practices and automating repetitive tasks.
+ Supporting program installations, system configurations, and user modifications.
+ Refining system monitoring and reporting in collaboration with support teams.
+ Operating across Agile and Waterfall methodologies to deliver timely solutions.
+ Mentor junior team members and contribute to a culture of continuous learning.
Why Join Us?
Join our team and contribute to a culture of innovation, collaboration, and excellence. If you are ready to advance your career and make a significant impact, we encourage you to apply.
Work in a way that works for you
We promote a healthy work/life balance across the organisation. We offer an appealing working prospect for our people. With numerous wellbeing initiatives, shared parental leave, study assistance and sabbaticals, we will help you meet your immediate responsibilities and your long-term goals.
+ Working flexible hours - flexing the times when you work in the day to help you fit everything in and work when you are the most productive.
Working for you
We know that your well-being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:
+ Annual Profit Share Bonus
+ Comprehensive Pension Plan
+ Home, office or commuting allowance
+ Generous vacation entitlement and option for sabbatical leave
+ Maternity, Paternity, Adoption and Family Care leave
+ Internal communities and networks
+ Recruitment introduction reward
About Our Business
The LexisNexis Intellectual Property (IP) division ( ) provides international patent content and a suite of online and analytic tools that meet the evolving needs of the intellectual property market. We deliver data to support LexisNexis IP search and analytics applications, empowering our customers with actionable insights and metrics for critical business decisions.
We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact .
Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .
Please read our Candidate Privacy Policy .
We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.
USA Job Seekers:
EEO Know Your Rights .
RELX is a global provider of information-based analytics and decision tools for professional and business customers, enabling them to make better decisions, get better results and be more productive.
Our purpose is to benefit society by developing products that help researchers advance scientific knowledge; doctors and nurses improve the lives of patients; lawyers promote the rule of law and achieve justice and fair results for their clients; businesses and governments prevent fraud; consumers access financial services and get fair prices on insurance; and customers learn about markets and complete transactions.
Our purpose guides our actions beyond the products that we develop. It defines us as a company. Every day across RELX our employees are inspired to undertake initiatives that make unique contributions to society and the communities in which we operate.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

London, London Cisco

Posted 11 days ago

Job Viewed

Tap Again To Close

Job Description

Join us on the Splunk TechOps team, empowering our customers to execute our vision making machine data accessible, usable, and valuable to everyone! The Splunk TechOps organization runs Splunk cloud, blending SRE, Systems Engineering and Service Engineering disciplines, across functional global teams.
Come join a team that is striving for operational awesomeness and trying to automate the world. We have a large presence with large cloud vendors. You should have experience with architecture, deployments, and networking in one or more of the major industry vendors. This is an incredible opportunity to use your existing cloud experience and drive the growth of Splunk Cloud.
**What we're looking for**
**NOTE:** **4 x 10h shifts: Wednesday - Saturday/8am-6pm**
We are looking for a TechOps SRE to help maintain, contribute to and improve the next generation of our large scale Cloud offering. You will be working with providers and supporting the infrastructure that powers Splunk's cloud offering.
**You should apply if**
+ **you are comfortable working 4 x 10h shifts: Wednesday - Saturday/8am-6pm**
+ You have operational experience at scale. You have had hands-on roles that deal with operating systems (particularly Linux) and networking. You might also have worked with Cloud technologies. Your previous job titles might be something close to systems admin, network engineer or devops engineer.
+ You're passionate about your work. Our customers are passionate about Splunk and we want the same from our engineers. You should enjoy actively being responsible for your work and be excited about your projects.
+ You love large complex systems. Experience in working on distributed systems or a passion for finding edge cases that appear at scale. You are interested in how to bring something from a small one off task to how to implement it across several thousand machines at once.
+ You have some development skills. We have code in several languages, ranging from Python and Shell to Go and C++. We don't expect you to be a software engineer but you should be familiar with basic programming and understand concepts like input sanitisation and unit testing.
+ "How can I automate this process?" is a question you constantly ask yourself.
+ Data drives your decisions. Data excites you and you make decisions based on numbers rather than assumptions. If an issue arises, you strive to be alerted before our customers notice.
+ You care about monitoring. Shipping code often and getting useful feedback excites you and you're not worried about changing direction when a solution isn't working as expected.
**What we provide**
+ Opportunities to develop and grow as an engineer. We are always expanding into new areas, working with open-source projects and contributing back, and exploring new technologies.
+ A team of incredibly capable and dedicated peers, all the way from engineering to product management and customer support.
+ Breadth and depth. You are interested to work in an area that dynamically scales to meet the need of Splunk's cloud offering. You want to go deep into optimizing how we automate every manual process and tedious task we encounter.
+ Growth and mentorship. We believe in growing engineers through ownership and leadership opportunities. We also believe that mentors help both sides of the equation.
+ A stable, collaborative, and supportive work environment. Honesty and collaboration are values we see as a core part of our team identity. We understand the value in open communication-working together to get things done, and to adapt to the changing needs of the team and individuals. This is reflected in both our internal communications and also in how we interact with our customers.
+ Balance. We don't expect people to work 12 hour days. We want you to be successful outside of work too. We trust our colleagues to be responsible with their time and commitment, and believe that balance helps cultivate a positive environment.
Splunk, a Cisco company, is an Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Halian Technology Limited

Posted 5 days ago

Job Viewed

Tap Again To Close

Job Description

permanent
We're Hiring: Mid-Level Site Reliability Engineer (SRE)
This role would be Fully Remote,
Permanent position
Are you passionate about automation, observability, and scaling systems to support millions of users?
Join ourclients SRE teamwithin thePlatform Engineeringorganization and help us build resilient, secure, and high-performing infrastructure.
What Youll Do:
Diagnose and resolve complex infrastructu.



















WHJS1_UKTJ

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Bristol, South West Twinstream Limited

Posted 7 days ago

Job Viewed

Tap Again To Close

Job Description

permanent

Site Reliability Engineer | £65,000–£95,000 DOE | Hybrid (Bristol-based, occasional site visits)
Clearance: Must be eligible for DV Clearance

Founded in 2019 by engineers solving complex cross-domain problems for government organisations, TwinStream delivers technical excellence and exceptional service to high-profile clients. Our teams work both on-site and remotely, supporting mission-critical sy.


WHJS1_UKTJ

This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Site reliability engineer Jobs in United Kingdom !

Site Reliability Engineer

Nottingham, East Midlands £65000 - £75000 annum Commify

Posted 2 days ago

Job Viewed

Tap Again To Close

Job Description

Permanent

About Us

Commify has transformed over the last decade under Private Equity ownership, operating in 9 countries including the UK, France, Spain, Italy, Romania, Germany, Netherlands, Australia, and the USA. Our mission is to make business communication brilliant through our extensive range of products that include SMS, WhatsApp, Email, and more. Serving over 70,000 businesses worldwide, we send more than 5 billion communications annually.

As we look to expand our impressive product portfolio, we recognize that our greatest asset is our people. Join us and be part of our success story!

Role Summary

In the role of Site Reliability Engineer at Commify, you will be an integral part of our SRE team. Your focus will be on ensuring that our products and platforms perform at their best, understanding how our software interacts with both physical and Cloud infrastructure to deliver exceptional messaging solutions.

Key responsibilities include:

  • Maintaining high levels of system performance through monitoring and performance tuning
  • Implementing scalability and fault tolerance
  • Automating processes and improving operational efficiencies
  • Troubleshooting application and middleware challenges
  • Collaborating with engineering teams to support high-throughput production environments
  • Building and maintaining robust deployment pipelines

Requirements

What essentials are we looking for?

  • Proficiency with Microsoft Azure
  • Strong expertise in Terraform, App Services, and Kubernetes
  • Fluent in both written and spoken English
  • A genuine passion for reliability in systems
  • Experience in creating and modifying Terraform deployments
  • Prior experience in an operations role, ideally as a Site Reliability Engineer
  • Ability to work cross-functionally, take ownership of tasks, and prioritize effectively
  • Excellent communication and collaboration skills
  • Experience with monitoring solutions (e.g., Datadog, Azure Application Insights, Log Analytics)
  • Programming/scripting skills for automation (favoring PowerShell, but also comfortable with Bash, C#, Ruby, or Python)
  • Experience with web-based applications

It's desirable for you to have:

  • Familiarity with Azure DevOps pipelines
  • Experience with Microsoft Server Operating Systems
  • Understanding of service level objectives and operational requirements for cloud-based solutions
  • Comprehensive knowledge of Microsoft Azure Cloud offerings (especially in PaaS)
  • Experience with tools such as Terraform, Ansible, VSTS, ARM, Puppet, Chef, Jenkins, ELK, and Grafana
  • Understanding of DNS, Load Balancer configuration, Active Directory, and network infrastructure in the cloud
  • Experience in agile environments and methodologies including TDD, Scrum, or Kanban
  • Knowledge of monitoring and alerting systems for microservice architectures
  • Applied knowledge of cloud security best practices

Benefits

  • Competitive Salary (£65 - 75,000)
  • Company bonus scheme
  • Comprehensive healthcare cash plan
  • A generous 27 days of annual leave in addition to Bank Holidays
  • 2 Wellbeing leave days and 2 days dedicated to giving back to your community
  • Enjoy your birthday off!
  • Employer pension contribution at 5%
  • Death in service benefit (4 times your salary)
  • Annual award recognition
  • Fun monthly and quarterly social events
  • Opportunities for training and professional development
  • Flexible hybrid working arrangements
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Oxford, South East Infleqtion

Posted 6 days ago

Job Viewed

Tap Again To Close

Job Description

Permanent

ABOUT THE COMPANY

Infleqtion is a global quantum technology company solving the world’s most challenging problems. The company harnesses quantum mechanics to build and integrate quantum computers, sensors, and networks. From fundamental physics to leading edge commercial products, Infleqtion enables “quantum everywhere” through our ecosystem of devices and platforms. We are recruiting for a Site Reliability Engineer with DevOps skills for our quantum computing platform.   

LOCATION

Infleqtion has offices in the USA, United Kingdom and Australia. This is a full-time position split between our Kidlington, Oxford office and the National Quantum Computing Centre, Harwell. Our flexible working policy enables all full-time employees to work up to 2 days a week from home if work permits.  Candidate will need to be able to travel independently to these locations, as required. 

POSITION SUMMARY

As part of our strategy for growth at Infleqtion UK, we are expanding our engineering team and recruiting a Site Reliability Engineer with DevOps skills. 

For this role, you will bring a mix of technical skills, problem-solving ability and effective communication, playing a pivotal role in operational reliability and code velocity for our quantum research and device development. 

This role involves a combination of network management, server administration, and proactive involvement in our continuous integration and deployment processes. The successful candidate will ideally possess a solid technical background in both systems engineering and software development. 

JOB RESPONSIBILITIES

  • Network Management: Design, implement, and manage robust networks, including configuring switches and managing network-connected devices. 
  • Software Deployment: Manage Docker containers and orchestrate CI/CD pipelines for efficient software deployment and updates. 
  • Infrastructure as code: Design, develop, and maintain IaC solutions and implement CI/CD pipelines to automate deployment processes. 
  • Collaboration and Documentation: Collaborate with both software developers and hardware engineers in a lab environment, documenting processes and system configurations for ongoing projects. 
  • Developing Standard Operating Procedures: Working with a variety of users and software engineers to implement and develop standard operating procedures for software deployments and system parameter updates. 
  • Proactive Problem-Solving Skills:  Integrating complex systems that include embedded boards, computer systems, and hardware interfaces. Your role will focus on seamlessly coordinating these components to ensure the system is consistently operational and optimised. 
  • 'System-down' planning and response. Own system management and coordinate response to efficiently maintain high up-times. Plan and implement system backup policies to ensure swift recovery. 

Requirements

  • Excellent understanding of networking technologies, connecting computers and embedded devices 
  • Proficiency with: 
    • Deployment and configuration management framework e.g. Ansible 
    • Linux systems 
    • Bash scripting 
    • Docker and/or Kubernetes 
    • Git and version control-centric workflows 
  • Good collaboration skills, able to work in a team environment where engagement and participation are an expected part of successful job performance. 

Experience 

  • A minimum of 2 years’ experience working in related engineering field. 
  • Proven experience working with complex systems consisting of a number of networked nodes, and including embedded systems (such as raspberry Pi, IoT devices). 
  • Professional-level verbal and written communication skills, able to effectively share information with technical and non-technical staff. 

Qualifications: 

  • Bachelor’s degree in computer science, engineering, or other related field, (or equivalent), or extensive experience. 
  • Baseline Personnel Security Standard (BPSS) required (we will arrange for successful candidate). 

Desirable: 

  • Experience with ARTIQ systems, or controlling hardware via python API. 
  • Experience working with production grade systems. 
  • Python, developing & maintaining code, managing packages and dependencies. 
  • A background/interest in Quantum Physics/Quantum computing. 

Benefits

In addition to your base compensation, we offer a generous Total Rewards program which includes: 

  • Competitive salary 
  • Unlimited PTO 
  • Generous company pension contribution 
  • Cycle to work and Technology schemes 
  • BUPA medical insurance upon successful completion of probationary period 
  • Incentive Stock Option Plan 
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

London, London Gizmo

Posted 14 days ago

Job Viewed

Tap Again To Close

Job Description

Permanent

Gizmo is an AI startup on a mission to make learning so easy that anyone can learn anything. We're building Duolingo for anything - a platform that uses gamification and social mechanics to make learning fun.  

With over 1 million monthly active users and $4M in annual recurring revenue, we’re already one of the fastest-growing startups in the UK. Backed by leading investors, we recently raised $22M in Series A funding to accelerate our vision of helping 1 billion people learn.

Role Overview
Reporting to the CTO, you will own capacity, performance and reliability for Gizmo’s full-stack platform as daily traffic climbs from hundreds of thousands to millions of users. You’ll write code across the stack, but your charter is classic SRE: defend SLOs , eliminate toil , and raise the ceiling on scale before it becomes a hard limit.

Key Responsibilities

  • Define SLIs/SLOs for latency, availability and error rate; codify error budgets and partner with product teams on trade-offs.
  • Perform load-testing, capacity modelling and up-front scalability design for PostgreSQL, OpenSearch, Redis, Hasura and CF Workers; produce data-driven scaling plans.
  • Extend metrics, structured logging and tracing; establish alert rules that page only on user-visible impact; build actionable runbooks .
  • Join the on-call rotation, lead blameless post-mortems, drive remediation work to closure and track MTTR/MTBF improvements.
  • Automate repetitive ops on Kubernetes and CI/CD; keep “toil” <50 % of your time by pushing fixes into code.
  • Coach full-stack engineers on query optimisation, schema design and back-pressure techniques; document patterns and anti-patterns by creating an SRE playbook

Requirements

  • Hands-on scale experience : you have run relational stores at 100 k+ TPS or 1 M+ concurrent users (e.g., multi-tenant PostgreSQL, sharded MySQL).
  • Strong backend fundamentals around concurrency, caching, indexing and distributed systems trade-offs.
  • Proven track record of setting SLOs, building dashboards (Prometheus/Grafana, OpenTelemetry, etc.) and tuning alerts.
  • Comfort with Kubernetes , IaC and cloud-native patterns; can debug from network to application layer.
  • Start-up bias for action: you prioritise high-leverage fixes, ship iteratively and own outcomes end-to-end.
  • Collaborative and feedback-driven; you welcome post-mortem culture and continuous improvement.
  • Driven by impact - you prioritise work that moves the needle!

Nice-to-haves: experience with Hasura internals, Cloudflare Workers edge optimisation, or operating OpenSearch at scale.

Benefits

  • Highly competitive salary.
  • You'll own a piece of what you're building - equity included.
  • Hybrid working model with 4 days in our East London office, ideally located between Shoreditch High Street, Old Street, and Liverpool Street stations.
  • The opportunity to become one of the earliest employees in one of the UK’s fastest-growing startups.
  • Private health insurance
This advertiser has chosen not to accept applicants from your region.
 

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Site Reliability Engineer Jobs