239 Devops Engineers jobs in London
Site Reliability Engineer

Posted 4 days ago
Job Viewed
Job Description
Are you enthusiastic about designing and managing cloud platforms? Do you find satisfaction in ensuring the reliability and performance of complex systems?
About the Team:
The LexisNexis Intellectual Property (IP) division ( ) provides international patent content and a suite of online and analytic tools that meet the evolving needs of the intellectual property market. We deliver data to support LexisNexis IP search and analytics applications, empowering our customers with actionable insights and metrics for critical business decisions.
Our corporate culture thrives on excellence, innovation, and a strong dedication to our customers, employees, and communities. Working here means joining a vibrant, diverse, and collaborative team where you are free to grow and contribute actively.
About the Role:
We are a high-performing systems engineering team operating in a fast-paced enterprise environment, focused on modernising our infrastructure while upholding strict security and compliance standards. This position provides assistance and input to management, develops and leads large multifunctional development activities, solves complex technical problems, writes complex code for computer systems, and serves as a senior source of expertise.
Requirements:
+ Deep knowledge of cloud services (e.g., EC2, S3, RDS, Lambda, Azure VMs, Azure Functions).
+ Good experience in Cloud Engineering with a strong focus on Azure and/or AWS
+ Experience with Infrastructure as Code (Terraform, ARM/BICEP).
+ Proficiency in containerization and orchestration tools (Docker, Kubernetes/EKS).
+ Skilled in scripting languages (Python, Bash, TypeScript, PowerShell).
+ Strong understanding of Linux/UNIX/Windows systems and storage.
+ Experience with monitoring tools (Datadog, Coralogix, CloudWatch, Azure Monitor).
+ Familiarity with SRE and DevOps practices.
+ Knowledge of networking and security best practices.
+ Excellent problem-solving and stakeholder management skills.
+ Databricks Knowledge is an added advantage.
Responsibilities:
+ Leading Kubernetes deployment and management, including orchestration, architecture, networking, CI/CD, storage, and security.
+ Collaborating with cross-functional teams to design and implement high-quality cloud solutions.
+ Administering and supporting Databricks environments, including permissions, storage, and networking.
+ Troubleshooting complex technical issues using observability tools and root-cause analysis.
+ Implementing infrastructure management best practices and automating repetitive tasks.
+ Supporting program installations, system configurations, and user modifications.
+ Refining system monitoring and reporting in collaboration with support teams.
+ Operating across Agile and Waterfall methodologies to deliver timely solutions.
+ Mentor junior team members and contribute to a culture of continuous learning.
Why Join Us?
Join our team and contribute to a culture of innovation, collaboration, and excellence. If you are ready to advance your career and make a significant impact, we encourage you to apply.
Work in a way that works for you
We promote a healthy work/life balance across the organisation. We offer an appealing working prospect for our people. With numerous wellbeing initiatives, shared parental leave, study assistance and sabbaticals, we will help you meet your immediate responsibilities and your long-term goals.
+ Working flexible hours - flexing the times when you work in the day to help you fit everything in and work when you are the most productive.
Working for you
We know that your well-being and happiness are key to a long and successful career. These are some of the benefits we are delighted to offer:
+ Annual Profit Share Bonus
+ Comprehensive Pension Plan
+ Home, office or commuting allowance
+ Generous vacation entitlement and option for sabbatical leave
+ Maternity, Paternity, Adoption and Family Care leave
+ Internal communities and networks
+ Recruitment introduction reward
About Our Business
The LexisNexis Intellectual Property (IP) division ( ) provides international patent content and a suite of online and analytic tools that meet the evolving needs of the intellectual property market. We deliver data to support LexisNexis IP search and analytics applications, empowering our customers with actionable insights and metrics for critical business decisions.
We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact .
Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .
Please read our Candidate Privacy Policy .
We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.
USA Job Seekers:
EEO Know Your Rights .
RELX is a global provider of information-based analytics and decision tools for professional and business customers, enabling them to make better decisions, get better results and be more productive.
Our purpose is to benefit society by developing products that help researchers advance scientific knowledge; doctors and nurses improve the lives of patients; lawyers promote the rule of law and achieve justice and fair results for their clients; businesses and governments prevent fraud; consumers access financial services and get fair prices on insurance; and customers learn about markets and complete transactions.
Our purpose guides our actions beyond the products that we develop. It defines us as a company. Every day across RELX our employees are inspired to undertake initiatives that make unique contributions to society and the communities in which we operate.
Site Reliability Engineer
Posted 5 days ago
Job Viewed
Job Description
As a Site Reliability Engineer (SRE) at Trade Nation, you will be part of a dynamic and collaborative team that ensures the reliability, availability, and performance of our web services and applications. You will work closely with developers, operations, and product teams to design, build, and maintain scalable, secure, and efficient systems. You will also monitor, troubleshoot, and resolve issues that affect the user experience and the business objectives.
Who we areWe are Trade Nation. We help our customers power up their trading through killer insights, transparent costs, and fairer ways to trade. We’re innovators, and proud of it. And we’ve grown a lot in our decade as a market-leading low-cost trading powerhouse. Our reach is global through our teams in the UK, Australia, South Africa, Seychelles and The Bahamas.
Founded on transparency, forged in trust and powered by people, we’re committed to empowering our customers to outperform the markets. How? By minimising expenses and harnessing technology to prioritise the lowest trading costs.
But enough about us. Let’s hear about you.
Who you areYou’re something special. You pride yourself on being unique and bringing your own history to the table – finding solutions to daily challenges in a way that can’t be done by anyone else. Maybe you talk a big game, maybe you don’t. The important thing is that you do what you say and follow through to see every customer thrive.
You don’t play with the bumpers up. That means breaking out of your lane when needed to help others – or forging your own completely. Every problem is our problem and that’s how you see it too. Because Trade Nation’s people have a shared vision, and you want to be part of making it a reality.
You know when to take the right sort of risks, the ones that push you to be better. You’re not afraid to try, fail, and then try harder. But don’t worry, you’ll have all the support you need to thrive with us at Trade Nation, and we can’t wait to enable you to learn and grow.
Ready to roll up your sleeves and get stuck in?
Our commitments to each otherWe have each other’s backs
There when we need each other most
We challenge each other
Be more creative, more curious, more bold
We thrive together
Taking our work to the next level
We form strong bonds
Through team building and social events
We don’t judge
Instead, we teach and are open to learning
We step up
Taking ownership and supporting each other to do the same
Responsibilities- Design, implement, and maintain scalable and reliable systems.
- Monitor system performance and troubleshoot issues to ensure high availability and reliability.
- Develop and maintain automation tools to streamline operations and reduce manual intervention.
- Collaborate with development teams to ensure new features and services are designed with reliability in mind.
- Implement and manage monitoring, alerting, and logging systems.
- Conduct root cause analysis of incidents and implement corrective actions to prevent recurrence.
- Participate in on-call rotations to provide support for critical systems.
- Continuously improve system performance, reliability, and scalability through proactive measures and best practices.
Requirements
- A bachelor's degree in computer science, engineering, or a related field, or equivalent work experience.
- At least three years of experience in Site Reliability Engineering, DevOps, or similar roles.
- Proficiency in JavaScript and ideally React.
- Experience with cloud platforms, such as AWS, GCP, or Azure, and related technologies, such as Kubernetes, Docker, Terraform, or CloudFormation.
- Experience with monitoring and observability tools, such as Prometheus, Grafana, Loki, or Sentry.
- Experience with troubleshooting and debugging tools, such as Wireshark, tcpdump, or gdb.
- Strong knowledge of web protocols, such as HTTP, TCP, UDP, DNS, and TLS.
- Strong communication and collaboration skills.
- Passion for learning new technologies and solving complex problems.
Benefits
- Competitive salary, and discretionary annual bonus.
- Private healthcare.
- Life Insurance, Critical Illness & Income Protection cover.
- Active Lifestyle allowance.
- Annual leave above minimum entitlement.
- Cycle to work scheme.
- Up to 3 weeks allowance to work in any location.
Site Reliability Engineer
Posted 11 days ago
Job Viewed
Job Description
Gizmo is an AI startup on a mission to make learning so easy that anyone can learn anything. We're building Duolingo for anything - a platform that uses gamification and social mechanics to make learning fun.
With over 1 million monthly active users and $4M in annual recurring revenue, we’re already one of the fastest-growing startups in the UK. Backed by leading investors, we recently raised $22M in Series A funding to accelerate our vision of helping 1 billion people learn.
Role Overview
Reporting to the CTO, you will own capacity, performance and reliability for Gizmo’s full-stack platform as daily traffic climbs from hundreds of thousands to millions of users. You’ll write code across the stack, but your charter is classic SRE: defend SLOs , eliminate toil , and raise the ceiling on scale before it becomes a hard limit.
Key Responsibilities
- Define SLIs/SLOs for latency, availability and error rate; codify error budgets and partner with product teams on trade-offs.
- Perform load-testing, capacity modelling and up-front scalability design for PostgreSQL, OpenSearch, Redis, Hasura and CF Workers; produce data-driven scaling plans.
- Extend metrics, structured logging and tracing; establish alert rules that page only on user-visible impact; build actionable runbooks .
- Join the on-call rotation, lead blameless post-mortems, drive remediation work to closure and track MTTR/MTBF improvements.
- Automate repetitive ops on Kubernetes and CI/CD; keep “toil” <50 % of your time by pushing fixes into code.
- Coach full-stack engineers on query optimisation, schema design and back-pressure techniques; document patterns and anti-patterns by creating an SRE playbook
Requirements
- Hands-on scale experience : you have run relational stores at 100 k+ TPS or 1 M+ concurrent users (e.g., multi-tenant PostgreSQL, sharded MySQL).
- Strong backend fundamentals around concurrency, caching, indexing and distributed systems trade-offs.
- Proven track record of setting SLOs, building dashboards (Prometheus/Grafana, OpenTelemetry, etc.) and tuning alerts.
- Comfort with Kubernetes , IaC and cloud-native patterns; can debug from network to application layer.
- Self-starter with a maker mindset. We’re looking for ex-founders or individuals with start-up experience.
- Start-up bias for action: you prioritise high-leverage fixes, ship iteratively and own outcomes end-to-end.
- Collaborative and feedback-driven; you welcome post-mortem culture and continuous improvement.
- Driven by impact - you prioritise work that moves the needle!
Nice-to-haves: experience with Hasura internals, Cloudflare Workers edge optimisation, or operating OpenSearch at scale.
Benefits
- Highly competitive salary.
- You'll own a piece of what you're building - equity included.
- Hybrid working model with 4 days in our East London office, ideally located between Shoreditch High Street, Old Street, and Liverpool Street stations.
- The opportunity to become one of the earliest employees in one of the UK’s fastest-growing startups.
- Private health insurance
Site Reliability Engineer
Posted 614 days ago
Job Viewed
Job Description
Lead Site Reliability Engineer
Posted today
Job Viewed
Job Description
Site Reliability Engineer II

Posted 4 days ago
Job Viewed
Job Description
**What You'll Do on a Typical Day:**
+ Design and implement next-generation highly scalable, and reliable applications using SaaS technology.
+ Translate functional specifications into logical, component-based technical designs.
+ Own delivery of application features end to end by working with internal and external teams.
+ Innovate and implement new ideas to solve complex software problems.
+ Work closely with geographically distributed team members
**What We're Looking For:**
+ Amex GBT Egencia's Technology organisation is looking for a highly motivated, self-driven, self-starter, and fast-growing potential individual to be part of a growing team of technologists. You are well-versed in SDLC and Agile methodologies.
+ You have at least 1-3 years of experience in software development and troubleshooting.
+ An independent thinker, who works around problems and who isn't shy of trying new technologies. You have validated experience working in parallel technologies apart from your core technology area (Java).
+ Prior experience in working harmoniously with a cross-geography team will be an added advantage. You should be equally appropriate in development, test, and debugging roles and be ready to wear many hats. This team values "fail-fast" learners and technology enthusiasts who view learning new technology as a fun experience.
+ Strong knowledge of Object Oriented Programming, Data Structures, and Algorithms
+ Good proficiency in any of the programming languages from Java, Golang, Python, or Bash
+ Proven ability to develop and support large-sized highly scalable software systems
+ Experience in AWS Services
+ Good knowledge of container orchestration frameworks primarily Kubernetes
+ Basic understanding of logging and monitoring frameworks
+ Knowledge of cloud computing concepts along with an understanding of application communication and routing is a plus
+ Good experience in developing and deploying AWS cloud-based platforms
+ Good understanding of network topologies with experience in hybrid cloud architecture will be a plus
+ Experience with the Agile Tool set and Programming Practices
+ Knowledge of CI-CD principles
+ Knowledge of server-side design patterns is a plus
+ Ability to quickly pick up new technologies, and languages with ease
+ A standout colleague who collaborates and incorporates feedback from all partners
+ Excellent written and verbal communication skills
+ BS or MS in Computer Science or equivalent degree
#GBTJobs
**Location**
London, United Kingdom
**The #TeamGBT Experience**
Work and life: Find your happy medium at Amex GBT.
+ **Flexible benefits** are tailored to each country and start the day you do. These include health and welfare insurance plans, retirement programs, parental leave, adoption assistance, and wellbeing resources to support you and your immediate family.
+ **Travel perks:** get a choice of deals each week from major travel providers on everything from flights to hotels to cruises and car rentals.
+ **Develop the skills you want** when the time is right for you, with access to over 20,000 courses on our learning platform, leadership courses, and new job openings available to internal candidates first.
+ **We strive to champion Inclusion** in every aspect of our business at Amex GBT. You can connect with colleagues through our global INclusion Groups, centered around common identities or initiatives, to discuss challenges, obstacles, achievements, and drive company awareness and action.
+ And much more!
All applicants will receive equal consideration for employment without regard to age, sex, gender (and characteristics related to sex and gender), pregnancy (and related medical conditions), race, color, citizenship, religion, disability, or any other class or characteristic protected by law.
Click Here ( for Additional Disclosures in Accordance with the LA County Fair Chance Ordinance.
Furthermore, we are committed to providing reasonable accommodation to qualified individuals with disabilities. Please let your recruiter know if you need an accommodation at any point during the hiring process. For details regarding how we protect your data, please consult the Amex GBT Recruitment Privacy Statement ( .
**What if I don't meet every requirement?** If you're passionate about our mission and believe you'd be a phenomenal addition to our team, don't worry about "checking every box;" please apply anyway. You may be exactly the person we're looking for!
Click Here to Learn More (
Lead Site Reliability Engineer
Posted 7 days ago
Job Viewed
Job Description
Lead SRE
to own and elevate our
Alerting & Incident Management platform . You’ll be the driving force behind reliability, customer satisfaction, and product excellence — ensuring smooth alert management, fewer engineering interruptions, and a best-in-class incident response experience.nThis role blends technical depth, customer impact, and product strategy — perfect for someone who thrives at the intersection of engineering, incident response, and product innovation.nWhat You’ll DonChampion customer experience by speeding up alert resolution and reducing interruptions for engineers.nBuild solutions to common pain points, shaping roadmaps, documentation, and technical knowledge.nDevelop benchmarking tools to improve performance, reliability, and scalability.nStay ahead of incident management trends to drive new workflows and product improvements.nMentor teams and lead with clear, impactful communication.nWhat We’re Looking Forn5+ years in software engineering, DevTools, or infrastructure.nStrong expertise in incident management, alert routing, and large-scale orchestration.nSaaS or incident management platform experience (PagerDuty, OpsGenie, etc. a plus).nSolid technical foundation with cloud/distributed systems.nExcellent communicator, comfortable working across US/IL time zones.nBonus: leadership experience, SRE/DevOps background, knowledge of SLO/SLA practices.
#J-18808-Ljbffrn
Be The First To Know
About the latest Devops engineers Jobs in London !
Lead Site Reliability Engineer
Posted 7 days ago
Job Viewed
Job Description
Lead Site Reliability Engineer
role at
Venquis .nLocation:
London UK / Hybrid / RemotenSector:
Media & Streaming TechnologynA leading TV streaming platform is expanding its engineering team to deliver high-performance, low-latency streaming to millions of viewers worldwide. We’re looking for a Lead Site Reliability Engineer (SRE) to drive reliability, observability, and scalability across our streaming services while mentoring a team of SREs.nWhat you’ll do
Lead end-to-end reliability strategy for video streaming pipelines, playback services, and backend systems.nBuild and maintain observability frameworks (Prometheus, Grafana, Datadog, OpenTelemetry) to monitor streaming quality, latency, and uptime.nScale cloud-native infrastructure (AWS/GCP/Azure) and orchestrate containerised applications (Kubernetes, Docker) for global distribution.nGuide incident management, disaster recovery, and post-mortems across multi-region streaming environments.nMentor junior SREs and collaborate with engineering teams to embed reliability by design into all development efforts.nWhat we’re looking for
Proven experience in high-scale distributed systems, preferably in streaming, media delivery, or content platforms.nDeep expertise with observability, monitoring, and incident response at global scale.nStrong cloud skills (AWS, GCP, Azure) and Infrastructure as Code (Terraform, Ansible, CI/CD pipelines).nProficiency in Python, Go, Java, or Bash for automation and tooling.nLeadership experience managing or mentoring an SRE or reliability engineering team.nThis role offers the opportunity to shape the reliability and performance of a platform watched by millions, balancing real-time user experience with operational excellence.nCompensation:
Great package (Base + Bonus)nDetails
Seniority level : Mid-Senior levelnEmployment type : Full-timenJob function : Engineering and Information TechnologynVenquis is acting as an Employment Agency in relation to this vacancy.
#J-18808-Ljbffrn
Site Reliability Engineer - Remote
Posted 299 days ago
Job Viewed
Job Description
At EFG (ESL FACEIT Group) we create worlds beyond gameplay where players and fans become community. We pride ourselves in having a corporate social responsibility which is that “IT’S NOT GG (Good Game), UNTIL IT’S GG FOR ALL”. We are passionate about the culture we foster that ultimately helps to create and shape the world of esports, gaming tournaments, leagues, events and holistic ecosystems staged for our millions of players, fans and heroes.
The Team:
As a Site Reliability Engineer at EFG, you will be designing, analyzing, and troubleshooting large-scale distributed systems. You will demonstrate a systematic problem-solving approach, and the ability to debug and optimize code and to automate routine tasks. You will ensure that EFG’s services and systems are reliable, that they have uptime appropriate to users' needs and they have a fast rate of improvement.
Apart from monitoring our systems' capacity and performance, you will also focus on optimizing existing systems, on building infrastructure and on eliminating work through automation. You will work collaboratively with the software engineering teams to deploy and operate our systems, and you will help to automate and streamline our operations and processes. Within this role, you will be given real responsibilities, and you have the opportunity to drive change and have a big impact on our products and platform.
What you will do:
- Maintaining and improving the monitoring and observability tools (Grafana/Prometheus/Thanos/Jaeger);
- Working closely with your team and with other cross-functional teams to help design, maintain and operate systems at scale;
- Developing and driving adoption of SRE best practices across the company;
- Leading on incident management process and adoption;
- Using your troubleshooting skills to help identify and fix operational issues;
- Working with Cloud Native technologies such as Kubernetes, Envoy, Istio, Prometheus and Helm;
- Working with the “Hashi Stack” (terraform, packer, vault);
- Experimenting with and introducing cutting edge technologies.
Requirements
- Proven experience as a Site Reliability Engineer, DevXP Engineer or Software Engineer, focusing on building and maintaining scalable infrastructures;
- Excellent working knowledge on at least one of the major cloud providers (GCP/AWS/Azure);
- You have experience with cluster management systems (Kubernetes);
- Knowledge of incident management: ability to investigate, troubleshoot, recover and prevent the recurrence of incidents that interfere with the normal delivery of IT services;
- Proficient in Go language and some level of proficiency in at least another language: Java, Python, Rust…;
- You have knowledge of GitOps practices;
- You have production scale experience with one of the following; MongoDB, Redis, MySQL;
- Experience contributing to open source technologies would be an added bonus.
Site Reliability Engineer, Region Services

Posted 4 days ago
Job Viewed
Job Description
Would you like to help implement innovative cloud computing solutions and solve the most complex technical problems? Are you excited by the prospect of building and running the world's largest cloud computing infrastructure to provide a better world for future generations?
Amazon Web Services (AWS) builds and operates some of the largest internet infrastructure on the planet; providing companies of all sizes with an infrastructure web services platform in the cloud. With AWS, customers provision compute power, storage, database, and other cloud resources as their business demands them. To meet the growing demand for AWS Services around the globe, we need exceptionally motivated people who are driven by learning and innovation.
AWS Utility Computing (UC) provides product innovations - from foundational services such as Amazon's Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS's services and features apart in the industry. As a member of the UC organization, you'll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services.
If you join us, you'll be part of a world-class team in a dynamic environment that has the entrepreneurial feel of a start-up. This is an opportunity to operate and engineer systems on a massive scale, and to gain world class experience in cloud computing. You'll be surrounded by people who are passionate about cloud computing, believe that first class service is critical to customer success, and are committed to improvement.
Top reasons to join our team:
- Be a catalyst to deliver truly disruptive products that are growing rapidly
- Define, build, own, and run services in high growth environments
- Solve unique and first-order problems to enable our internal teams to deliver for our customers
- Build and operate distributed systems
- Design and build the tools and utilities that are part of the AWS fleet running our internal services
Key job responsibilities
The Systems Development engineer will be a key member of a new team pioneering automated build and deployment of Windows based services. The team is adopting a code-first and hands off CI/CD based approach to drive operational excellence and cross environment parity. This will involve building Ansible based Infrastructure as Code as well as custom integrations with existing and new Windows services.
About the team
About AWS
Diverse Experiences
AWS values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn't followed a traditional path, or includes alternative experiences, don't let it stop you from applying.
Why AWS?
Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating - that's why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.
Inclusive Team Culture
AWS values curiosity and connection. Our employee-led and company-sponsored affinity groups promote inclusion and empower our people to take pride in what makes us unique. Our inclusion events foster stronger, more collaborative teams. Our continual innovation is fueled by the bold ideas, fresh perspectives, and passionate voices our teams bring to everything we do.
Mentorship & Career Growth
We're continuously raising our performance bar as we strive to become Earth's Best Employer. That's why you'll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
Work/Life Balance
We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there's nothing we can't achieve.
Basic Qualifications
- Knowledge of systems engineering fundamentals (networking, storage, operating systems)
- Experience (non-internship) in professional software development
- Experience designing or architecting (design patterns, reliability and scaling) of new and existing systems
- Experience in networking, storage systems, operating systems and hands-on systems engineering
- Experience programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby
Preferred Qualifications
- Experience with Ansible (preferred), Powershell or Javascript/Typescript
Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build. Protecting your privacy and the security of your data is a longstanding top priority for Amazon. Please consult our Privacy Notice ( ) to know more about how we collect, use and transfer the personal data of our candidates.
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit for more information. If the country/region you're applying in isn't listed, please contact your Recruiting Partner.