117 Devops Engineers jobs in London

Site Reliability Engineer

London, London £80000 - £90000 Annually Rise Technical Recruitment

Posted today

Job Viewed

Tap Again To Close

Job Description

permanent

Senior Site Reliability Engineer
London - Hybrid
80,000 - 90,000 + 38 Days Holiday + Private Healthcare + Life Assurance + Flexible Working + Pension


Excellent opportunity for Site Reliability Engineer to join a forward-thinking and high-growth technology company offering a Hybrid work environment, a great benefits, and opportunities for further progression!

This company operates at the forefront of digital transformation, delivering a unified platform built for scalability, resilience, and performance. With a strong culture rooted in integrity, creativity, and technical excellence, they've become a trusted partner across global industries.

In this role you'll take ownership of platform reliability, resilience engineering, and incident management across cutting-edge cloud infrastructure. You'll play a key role in ensuring uptime, performance, and continuous improvement of core systems.

The ideal candidate will be an experienced Site Reliability Engineer with a deep background in AWS, Kubernetes (EKS), Terraform, and monitoring/eventing tools. You'll have a strong grasp of application-level troubleshooting, chaos engineering, and performance tuning.

This is a fantastic opportunity to work in a modern DevOps environment where innovation is encouraged, personal development is supported, and technical impact is real.

The Role:
*Manage and optimise AWS and Kubernetes (EKS) infrastructure
*Implement resilience strategies and conduct chaos engineering experiments
*Monitor and maintain Kafka clusters for performance and reliability
*Respond to and resolve application-level production incidents

The Person:
*5+ years in SRE, DevOps, or infrastructure engineering
*Strong experience with AWS, EKS/Kubernetes, and Terraform
*Familiar with Kafka and observability tools like Datadog or Grafana
*Able to troubleshoot issues across infrastructure and application layers

Reference number: BBBH(phone number removed)

To apply for this role or for to be considered for further roles, please click "Apply Now" or contact Tommy Williams at Rise Technical Recruitment.

Rise Technical Recruitment Ltd acts an employment agency for permanent roles and an employment business for temporary roles.

The salary advertised is the bracket available for this position. The actual salary paid will be dependent on your level of experience, qualifications and skill set. We are an equal opportunities employer and welcome applications from all suitable candidates.

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

London, London Kyndryl

Posted 9 days ago

Job Viewed

Tap Again To Close

Job Description

**Who We Are**
At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward - always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities.
**The Role**
Join us as a Site Reliability Engineer (SRE) and embark on an exciting journey of ensuring reliability, resiliency, and innovation in our information systems and ecosystems. As an SRE at Kyndryl, you'll be at the forefront of driving continuous improvement and delivering exceptional service to our customers.
Your role goes beyond traditional engineering, as you'll have the opportunity to analyze business needs, tackle complex problems, and provide strategic advice and designs. You'll be involved in every stage of the software lifecycle, from building and testing to deploying changes and maintaining robust systems.
We're looking for a true visionary who can think strategically and help shape the future of our services. Your expertise in building trusted relationships with customers and partnering with them for success will be instrumental in driving our growth.
As an SRE, you'll have the unique opportunity to work on end-to-end services, spanning customer sites and platforms. Collaboration and proactivity are key as you work alongside a talented team of professionals, eager to make a difference. You'll embrace an entrepreneurial mindset, taking ownership of your responsibilities and constantly seeking innovative solutions.
With an unwavering focus on quality, robustness, and security, you'll be a driving force in implementing cutting-edge tools that enhance our operations, improve reliability, and gather valuable feedback on our platforms. Your ability to identify and mitigate common operational issues will play a crucial role in delivering seamless experiences to our customers.
If you're passionate about pushing the boundaries of technology, thrive in a collaborative environment, and are motivated by the opportunity to shape the future of reliability engineering, then we want to hear from you. Join our team and be part of a dynamic and forward-thinking organization that values innovation and excellence in everything we do.
Your Future at Kyndryl
Kyndryl has a global footprint, which means that as a Site Reliability Engineer at Kyndryl you will have opportunities to work on projects and collaborate with colleagues from around the world. This role is dynamic and influential - offering a wide range of professional and personal growth opportunities that you won't find anywhere else.
**Who You Are**
You're good at what you do and possess the required experience to prove it. However, equally as important - you have a growth mindset; keen to drive your own personal and professional development. You are customer-focused - someone who prioritizes customer success in their work. And finally, you're open and borderless - naturally inclusive in how you work with others.
Product Engineering
To design, manage and support the introduction of new Products / Services to LBG through new technology, Patterns & Blueprints. This involves liaising with the customer to draw out and define the actual requirements. To design a solution that meets the requirements and to call out any technical issues. The Engineers are expected to liaise with and keep aligned to industry best practise for the use of their appropriate product stacks.
The role includes engaging stakeholders from all levels of both Managed and Project Services teams to ensure that they are aware of the way we consume various Products and what new skills are needed to be rolled out because of any new services implemented.
The Engineer must be able to represent the details of the solutions through any appropriate governance and discuss and agree the baselines for security compliance of the Product Offerings.
Lastly the Engineer is expected to provide L4 support as and when needed to the Managed Service teams and whilst they are not expected to be on call they are requested if possible to assist out of normal hours in the event of a major customer issue.
These are the base skills requested.
Skills
- Experience of Supporting / Management of MQ covering both current and past versions.
- Experience of designing solutions for MQ / RDQM including demonstrating working around technical issues to deliver the solution.
- Knowledge of managing, supporting and designing solutions for any associated clustering that their product offers.
- Knowledge of Automation tools such as VRA / VRO, Ansible.
- Knowledge of Backup / Restore processes.
- Knowledge of both Virtualisation and Appliance deployments.
- Defect Management.
- Storage / SAN.
- Technical Leadership.
- Technical Fault Fixing level 3/4.
- Networking (DHCP / WINS / DNS).
- Chef.
- VCenter / vSphere.
- Active Directory.
- Creation / Management of Roadmaps.
- Product Management (Documentation, Governance, Stake Holder Management, Customer Management).
**Being You**
Diversity is a whole lot more than what we look like or where we come from, it's how we think and who we are. We welcome people of all cultures, backgrounds, and experiences. But we're not doing it single-handily: Our Kyndryl Inclusion Networks are only one of many ways we create a workplace where all Kyndryls can find and provide support and advice. This dedication to welcoming everyone into our company means that Kyndryl gives you - and everyone next to you - the ability to bring your whole self to work, individually and collectively, and support the activation of our equitable culture. That's the Kyndryl Way.
**What You Can Expect**
With state-of-the-art resources and Fortune 100 clients, every day is an opportunity to innovate, build new capabilities, new relationships, new processes, and new value. Kyndryl cares about your well-being and prides itself on offering benefits that give you choice, reflect the diversity of our employees and support you and your family through the moments that matter - wherever you are in your life journey. Our employee learning programs give you access to the best learning in the industry to receive certifications, including Microsoft, Google, Amazon, Skillsoft, and many more. Through our company-wide volunteering and giving platform, you can donate, start fundraisers, volunteer, and search over 2 million non-profit organizations. At Kyndryl, we invest heavily in you, we want you to succeed so that together, we will all succeed.
**Get Referred!**
If you know someone that works at Kyndryl, when asked 'How Did You Hear About Us' during the application process, select 'Employee Referral' and enter your contact's Kyndryl email address.
Kyndryl is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, pregnancy, disability, age, veteran status, or other characteristics. Kyndryl is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

London, London Cisco

Posted 9 days ago

Job Viewed

Tap Again To Close

Job Description

Join us on the Splunk TechOps team, empowering our customers to execute our vision making machine data accessible, usable, and valuable to everyone! The Splunk TechOps organization runs Splunk cloud, blending SRE, Systems Engineering and Service Engineering disciplines, across functional global teams.
Come join a team that is striving for operational awesomeness and trying to automate the world. We have a large presence with large cloud vendors. You should have experience with architecture, deployments, and networking in one or more of the major industry vendors. This is an incredible opportunity to use your existing cloud experience and drive the growth of Splunk Cloud.
**What we're looking for**
**NOTE:** **4 x 10h shifts: Wednesday - Saturday/8am-6pm**
We are looking for a TechOps SRE to help maintain, contribute to and improve the next generation of our large scale Cloud offering. You will be working with providers and supporting the infrastructure that powers Splunk's cloud offering.
**You should apply if**
+ **you are comfortable working 4 x 10h shifts: Wednesday - Saturday/8am-6pm**
+ You have operational experience at scale. You have had hands-on roles that deal with operating systems (particularly Linux) and networking. You might also have worked with Cloud technologies. Your previous job titles might be something close to systems admin, network engineer or devops engineer.
+ You're passionate about your work. Our customers are passionate about Splunk and we want the same from our engineers. You should enjoy actively being responsible for your work and be excited about your projects.
+ You love large complex systems. Experience in working on distributed systems or a passion for finding edge cases that appear at scale. You are interested in how to bring something from a small one off task to how to implement it across several thousand machines at once.
+ You have some development skills. We have code in several languages, ranging from Python and Shell to Go and C++. We don't expect you to be a software engineer but you should be familiar with basic programming and understand concepts like input sanitisation and unit testing.
+ "How can I automate this process?" is a question you constantly ask yourself.
+ Data drives your decisions. Data excites you and you make decisions based on numbers rather than assumptions. If an issue arises, you strive to be alerted before our customers notice.
+ You care about monitoring. Shipping code often and getting useful feedback excites you and you're not worried about changing direction when a solution isn't working as expected.
**What we provide**
+ Opportunities to develop and grow as an engineer. We are always expanding into new areas, working with open-source projects and contributing back, and exploring new technologies.
+ A team of incredibly capable and dedicated peers, all the way from engineering to product management and customer support.
+ Breadth and depth. You are interested to work in an area that dynamically scales to meet the need of Splunk's cloud offering. You want to go deep into optimizing how we automate every manual process and tedious task we encounter.
+ Growth and mentorship. We believe in growing engineers through ownership and leadership opportunities. We also believe that mentors help both sides of the equation.
+ A stable, collaborative, and supportive work environment. Honesty and collaboration are values we see as a core part of our team identity. We understand the value in open communication-working together to get things done, and to adapt to the changing needs of the team and individuals. This is reflected in both our internal communications and also in how we interact with our customers.
+ Balance. We don't expect people to work 12 hour days. We want you to be successful outside of work too. We trust our colleagues to be responsible with their time and commitment, and believe that balance helps cultivate a positive environment.
Splunk, a Cisco company, is an Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

£75000 - £90000 annum orbit

Posted 571 days ago

Job Viewed

Tap Again To Close

Job Description

Permanent

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer II

London, London American Express Global Business Travel

Posted 9 days ago

Job Viewed

Tap Again To Close

Job Description

Amex GBT is a place where colleagues find inspiration in travel as a force for good and - through their work - can make an impact on our industry. We're here to help our colleagues achieve success and offer an inclusive and collaborative culture where your voice is valued.
**What You'll Do on a Typical Day:**
+ Design and implement next-generation highly scalable, and reliable applications using SaaS technology.
+ Translate functional specifications into logical, component-based technical designs.
+ Own delivery of application features end to end by working with internal and external teams.
+ Innovate and implement new ideas to solve complex software problems.
+ Work closely with geographically distributed team members
**What We're Looking For:**
+ Amex GBT Egencia's Technology organisation is looking for a highly motivated, self-driven, self-starter, and fast-growing potential individual to be part of a growing team of technologists. You are well-versed in SDLC and Agile methodologies.
+ You have at least 1-3 years of experience in software development and troubleshooting.
+ An independent thinker, who works around problems and who isn't shy of trying new technologies. You have validated experience working in parallel technologies apart from your core technology area (Java).
+ Prior experience in working harmoniously with a cross-geography team will be an added advantage. You should be equally appropriate in development, test, and debugging roles and be ready to wear many hats. This team values "fail-fast" learners and technology enthusiasts who view learning new technology as a fun experience.
+ Strong knowledge of Object Oriented Programming, Data Structures, and Algorithms
+ Good proficiency in any of the programming languages from Java, Golang, Python, or Bash
+ Proven ability to develop and support large-sized highly scalable software systems
+ Experience in AWS Services
+ Good knowledge of container orchestration frameworks primarily Kubernetes
+ Basic understanding of logging and monitoring frameworks
+ Knowledge of cloud computing concepts along with an understanding of application communication and routing is a plus
+ Good experience in developing and deploying AWS cloud-based platforms
+ Good understanding of network topologies with experience in hybrid cloud architecture will be a plus
+ Experience with the Agile Tool set and Programming Practices
+ Knowledge of CI-CD principles
+ Knowledge of server-side design patterns is a plus
+ Ability to quickly pick up new technologies, and languages with ease
+ A standout colleague who collaborates and incorporates feedback from all partners
+ Excellent written and verbal communication skills
+ BS or MS in Computer Science or equivalent degree
#GBTJobs
**Location**
London, United Kingdom
**The #TeamGBT Experience**
Work and life: Find your happy medium at Amex GBT.
+ **Flexible benefits** are tailored to each country and start the day you do. These include health and welfare insurance plans, retirement programs, parental leave, adoption assistance, and wellbeing resources to support you and your immediate family.
+ **Travel perks:** get a choice of deals each week from major travel providers on everything from flights to hotels to cruises and car rentals.
+ **Develop the skills you want** when the time is right for you, with access to over 20,000 courses on our learning platform, leadership courses, and new job openings available to internal candidates first.
+ **We strive to champion Inclusion** in every aspect of our business at Amex GBT. You can connect with colleagues through our global INclusion Groups, centered around common identities or initiatives, to discuss challenges, obstacles, achievements, and drive company awareness and action.
+ And much more!
All applicants will receive equal consideration for employment without regard to age, sex, gender (and characteristics related to sex and gender), pregnancy (and related medical conditions), race, color, citizenship, religion, disability, or any other class or characteristic protected by law.
Click Here ( for Additional Disclosures in Accordance with the LA County Fair Chance Ordinance.
Furthermore, we are committed to providing reasonable accommodation to qualified individuals with disabilities. Please let your recruiter know if you need an accommodation at any point during the hiring process. For details regarding how we protect your data, please consult the Amex GBT Recruitment Privacy Statement ( .
**What if I don't meet every requirement?** If you're passionate about our mission and believe you'd be a phenomenal addition to our team, don't worry about "checking every box;" please apply anyway. You may be exactly the person we're looking for!
Click Here to Learn More (
This advertiser has chosen not to accept applicants from your region.

Principal Site Reliability Engineer

London, London Orgvue

Posted 6 days ago

Job Viewed

Tap Again To Close

Job Description

Permanent

Orgvue is a leading organizational design and planning software platform that captures the power of data visualization and modelling to build more adaptable, and better performing organizations. HR, finance and business leaders use Orgvue for actionable insight and analysis that helps them make faster workforce decisions in a constantly changing world.

Orgvue is used by the world’s largest and best-known enterprises and management consulting firms to visualize and confidently build the businesses they want tomorrow, today. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney.

We are seeking a Principal Site Reliability Engineer who will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure.

Role

In this role you will work across product, platform, and operations teams to ensure our systems are reliable, observable, and resilient, even at scale.

This role combines hands-on technical capability with strategic vision, helping us build a world-class reliability culture and a robust engineering foundation for growth. We're looking for someone who has technical expertise, is a great communicator and enjoys collaborating across multiple teams.

Responsibilities

  • Define and enforce SLOs, SLIs, and error budgets across critical services
  • Crafting and implementing a cloud infrastructure and tooling strategy       
  • Work across our Org to level up SRE practices
  • Help implement robust observability metrics, logs & traces using our observability tool
  • Guide the team in building automated, self-healing systems
  • Own and evolve our incident response processes, including on-call practices and post-mortem culture
  • Mentor engineers across the org on best practices in reliability, operational readiness, and scalable infrastructure
  • Drive Infrastructure as Code (IaC) using Terraform, Kubernetes, CloudFormation and GitOps practices
  • Collaborate closely with security, DevOps, and software teams to ensure compliance, scalability, and operational excellence
  • Evaluate and introduce tools, patterns, and practices that improve the performance and reliability of our SaaS platform

Requirements

  • Demonstrable experience leading SRE transformations
  • Deep hands-on expertise with Kubernetes (EKS preferred) in production environments
  • Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)
  • Expert in Infrastructure as Code using tools such as Terraform , with knowledge of GitOps workflows
  • Strong background in observability: metrics, visualization, logging, and tracing
  • Understanding of automation, SDLC, CI/CD pipelines, deployment automation, and blue/green or canary releases
  • Proven experience with incident management, disaster recovery planning, root cause analysis, and post-incident reviews

Benefits

  • Hybrid working - 1+ days a week in the London office
  • Wellbeing: Sanctus Coaching, Virtual fitness sessions, Wellbeing webinars, Annual Wellbeing day
  • Subsidised Gym Membership
  • Private Medical Insurance (including Dental and Vision) and Life Assurance
  • 25 days holiday (increasing to 30 days at a rate of 1 extra day per year)
  • Summer Fridays (half-day Fridays for the months of July and August)
  • Employer pension contribution of 5% of your gross salary, if you contribute a minimum of 3%
  • Season ticket Loan
  • Cycle to Work Scheme
  • Annual Discretionary Bonus

'Here at Orgvue we promote individualism and a diverse workforce to build on our future success'

This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer

London, London Board Intelligence

Posted 11 days ago

Job Viewed

Tap Again To Close

Job Description

Permanent

Board Intelligence is a technology and advisory firm that supercharges boards with the science of board effectiveness. We build better businesses and benefit society.  

Through a suite of AI-powered software tools, evaluation frameworks, and advisory services that distil twenty years of boardroom experience, we improve the efficiency of board processes and the effectiveness of boards.  

We work with over 70,000 leaders and 3,000 organisations across the world, with clients across the Fortune 500, FTSE 100, and OMX 30. In 2024 we received substantial backing from K1 Investment Management – the leading B2B Enterprise SaaS investors. We are at the beginning of significant growth, and we’re looking for superb talent to join us on this journey.  

As we grow, we’re fiercely protective of our culture and values. Many of us, including our founders, have families and other priorities, so we know the value of a supportive company.  

The team is diverse and friendly. We value fun: most days you’ll find a social event or learning opportunity to get involved with, including company socials, away days, philanthropic activities and lunch & learns.

Our Mission

We unleash the potential of organisations through the science of board effectiveness, building better businesses and benefiting society. 

The Opportunity 

As a Senior Site Reliability Engineer (SRE), you'll be joining a team whose mission is to ensure the availability, performance, security and reliability of our platform and core services, ensuring that they meet the needs of our internal and external users. You will take the lead on projects across the entire breadth of our tech stack, from planning all the way through to delivery and maintenance - you will bring others on the team with you on the journey too and not just go it alone. You will be responsible for visibility and monitoring of those systems, for building tooling and automation to reduce TOIL and for responding to incidents as part of our 24/7 SRE on-call team.

The SRE team:

  • Strives to provide the highest standards of Availability, Scalability, Performance and Security for our Software as a Service environments across multiple cloud vendors and our own private cloud physical infrastructure hosted at datacentres in the UK.
  • Provides enabling infrastructure, pipelines and tooling to support product development.
  • Works closely with security, product development and commercial teams to ensure the future suitability of our infrastructure
  • Agrees and sets standards and methodologies for engineering work
  • Proactively monitors our platform and responds to incidents as part of a 24 / 7 rota
Key responsibilities of the role

We're looking for a great Senior SRE to be a hands on individual contributor to key technical projects and to help us build a first-class SRE function. This role will involve:

  • Hands on work with technical projects, taking direction from the team Principals
  • Implement and maintain monitoring solutions / metric-driven alerting, logging and tracing
  • Troubleshoot in complex environments
  • Establish and measure SLIs and SLOs with engineering teams and continuously improve relationships and ways of working with other engineering teams
  • Participate in periodic 24x7 paid on-call duties
  • Holds, or is eligible to obtain HMG Security Clearance at the SC level
  • Build and manage systems, infrastructure and applications using infrastructure as code and automation (Terraform, Ansible, K8s, Helm, Go)
  • Pair programming, knowledge sharing and running appropriate training sessions for the team
  • Writing well-defined tickets (and supporting documentation when required) as well as keeping them up-to-date

Requirements

What experience and skills you have

We prefer to work with the best talent regardless of whether you are familiar with all of the tools that we use. We don’t need you to be familiar with everything on this list but experience in some or all of these areas will be useful and a willingness to dive in and learn the others, essential.

  • Security Clearance (SC) in the UK
  • A strong background in SRE/DevOps or Linux System Administration
  • A strong background in system automation using configuration management systems such as Ansible, Chef or Puppet.
  • A solid understanding of containerisation and container orchestration using tools such as Kubernetes
  • Experience with creation of automation using APIs
  • Experience of automation testing in an Agile Software environment
  • Close familiarity with some or all of:
    • Network management and optimisation
    • Postgresql Database management and optimisation
    • With common security frameworks CIS, NIST, OWASP
  • Familiarity with Public Cloud Services like AWS | GCP | Azure
  • Familiarity with co-located physical infrastructure (we’re currently hybrid)
  • Solid understanding of Continuous Integration (CI) and Continuous Deployment (CD)
  • Close familiarity with or direct experience of the trade-offs and design decisions Software Engineers need to make when developing applications that must perform and scale well in the real world
  • Experience with technical writing and or reviewing technical designs
  • Strong experience and understanding of Agile practices including Scrum, Kanban etc
  • An understanding of one or more of the following languages: Ruby, Java, Go, Bash/Shell
  • Strong experience with issue tracking software like Jira and story management lifecycle in general

Traits

  • Strong communication skills with the ability and openness to work across a range of varied stakeholders and confidence to check and challenge when required.
  • Cares about evolving SRE best practices (through a security lens) and is driven to find the right ways of working with the team
  • Appreciation of architecture decisions and trade offs
  • Is self-driven and constantly striving to improve everything with automation and monitoring
  • Is able and willing to travel to our physical datacenters in the U.K should the need arise
  • Demonstrates and promotes positive attitudes and behaviours: collaboration, learning, sharing, respect and kindness
Tech Stack 

Our applications are written in Ruby (with Rails) or Java. Client-side web apps are written in React, and some services in Clojure, Java and Go.

Our platform consists of:

  • Multiple Kubernetes Cluster for Container orchestration
  • Apache Kafka and Redis shortly Postgres for event messaging
  • Postgres for data storage
  • OpenStack Swift for Object storage
  • Juniper & Cisco networking devices
  • A number of internally written tools for managing the platform written in Go

We run our own physical infrastructure co-located in three datacentres across the UK. We also run a public cloud Production Environment on GCP for one of our products and we’re moving in the direction of more public cloud for production and pre-production environments and pipelines.

You do not need experience with all of that but a willingness to embrace and learn the bits that are new to you using knowledge and training tools available to you such as (Secureflag)

Benefits

  • Competitive salary & pension scheme
  • Personal performance bonus
  • 26 days holiday each calendar year
  • Bupa health & dental cover
  • Group life insurance
  • EAP; AIG Smart Health and Bereavement Counselling & Probate Helpline
  • Regular training & development, mini MBA series, lunch & learns
  • Cycle to work scheme
  • Competitive parental policies
  • Gym membership discounts
  • Monthly company socials
This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Devops engineers Jobs in London !

Site Reliability Engineer - Remote

London, London ESL FACEIT Group

Posted 257 days ago

Job Viewed

Tap Again To Close

Job Description

Permanent

At EFG (ESL FACEIT Group) we create worlds beyond gameplay where players and fans become community. We pride ourselves in having a corporate social responsibility which is that “IT’S NOT GG (Good Game), UNTIL IT’S GG FOR ALL”. We are passionate about the culture we foster that ultimately helps to create and shape the world of esports, gaming tournaments, leagues, events and holistic ecosystems staged for our millions of players, fans and heroes.

The Team:

As a Site Reliability Engineer at EFG, you will be designing, analyzing, and troubleshooting large-scale distributed systems. You will demonstrate a systematic problem-solving approach, and the ability to debug and optimize code and to automate routine tasks. You will ensure that EFG’s services and systems are reliable, that they have uptime appropriate to users' needs and they have a fast rate of improvement. 

Apart from monitoring our systems' capacity and performance, you will also focus on optimizing existing systems, on building infrastructure and on eliminating work through automation.  You will work collaboratively with the software engineering teams to deploy and operate our systems, and you will help to automate and streamline our operations and processes. Within this role, you will be given real responsibilities, and you have the opportunity to drive change and have a big impact on our products and platform.

What you will do:

  • Maintaining and improving the monitoring and observability tools (Grafana/Prometheus/Thanos/Jaeger);
  • Working closely with your team and with other cross-functional teams to help design, maintain and operate systems at scale;
  • Developing and driving adoption of SRE best practices across the company;
  • Leading on incident management process and adoption;
  • Using your troubleshooting skills to help identify and fix operational issues;
  • Working with Cloud Native technologies such as Kubernetes, Envoy, Istio, Prometheus and Helm;
  • Working with the “Hashi Stack” (terraform, packer, vault);
  • Experimenting with and introducing cutting edge technologies.

Requirements

  • Proven experience as a Site Reliability Engineer, DevXP Engineer or Software Engineer, focusing on building and maintaining scalable infrastructures;
  • Excellent working knowledge on at least one of the major cloud providers (GCP/AWS/Azure);
  • You have experience with cluster management systems (Kubernetes);
  • Knowledge of incident management: ability to investigate, troubleshoot, recover and prevent the recurrence of incidents that interfere with the normal delivery of IT services;
  • Proficient in Go language and some level of proficiency in at least another language: Java, Python, Rust…;
  • You have knowledge of GitOps practices;
  • You have production scale experience with one of the following; MongoDB, Redis, MySQL;
  • Experience contributing to open source technologies would be an added bonus.
This advertiser has chosen not to accept applicants from your region.

DevOps Engineer / Site Reliability Engineer

London, London £50000 - £90000 annum Freelancer.com

Posted 13 days ago

Job Viewed

Tap Again To Close

Job Description

Permanent

As a critical and trusted member of the Systems Engineering team, you'll be working side-by-side with software engineers to design and deliver mission critical services and systems. You'll be working with infrastructure and services at scale, utilizing wide variety of cutting edge technologies used to support our high-traffic real-time Freelancer.com marketplace and a range of other business products fully deployed in Amazon Web Services cloud and powered by Nginx, MySQL, Redis, ElasticSearch, RabbitMQ, Consul, Docker, and Kubernetes. It is our mission to build highly resilient, dynamically scaling, self-healing systems by automating and monitoring everything using Terraform, Puppet, Prometheus, Grafana, Kibana, and Jenkins.

Requirements:

  • Strong understanding of operating systems, networking and systems architecture;
  • Strong experience working with Linux, as well as database, web, and file servers at scale in production environments;
  • Strong experience with any cloud, virtualization, and/or container services (AWS, GCP, Azure, VMware, OpenStack, Docker, Kubernetes, Docker Swarm, AWS ECS)
  • Experience working with any configuration management and infrastructure orchestration tools such as Puppet, Chef, Ansible, CloudFormation or Terraform;
  • Programming experience with any of Python, Go, PHP,Ruby, Node.js.
  • Experience with incident response and security-focused mindset;
  • University Degree in Computer Science/Engineering (or related field) is preferred;

Benefits:

  • Fast-track your career growth – our meritocratic culture is known for promoting from within and producing industry leaders in tech.
  • Delicious Friday lunches from a rotating selection of local restaurants.
  • Engaging Weekly Town Halls with global presentations and open Q&A sessions with our CEO (feel free to ask him anything!).
  • Hack-a-thons - Get hacking and programming in this quarterly company-wide event where teams create solutions to existing problems and win prizes. The 2-day event is filled with games, events, shows, food and more.
  • Our London office is located close to Old Street Underground station

Just when you thought it couldn’t get any better:

  • Change lives every day – Everything we do as part of our jobs contribute to improving the lives of our users on a global scale. Our mission is to provide one billion jobs. Not many companies actually make a difference like Freelancer does  in providing opportunity and income to people all around the world.
  • Fast-track your career - We boast a meritocratic culture, renowned for hiring into senior roles from within and producing many business and product leaders in the technology industry.

Who is Freelancer.com?

Freelancer owns Escrow.com, the world’s largest online escrow company with over US$7 billion in transactions secured, powering the sale of jet parts to oil wells. Freelancer also owns Loadshift, a marketplace with more freight on a typical day than the distance from the earth to the moon, with over 650 million kilometres posted since inception.

This won't be your typical cog-in-the-machine type of job. If you're a high achiever with talent, looking for something more than a boring job in corporate, want to work with the best and brightest and don't need to be handheld, this is the job for you.

If you join a mega-cap technology company as the 10,000th hire you might struggle to figure out the impact you are making. If you join a startup, you might get to work on the latest fad, but likely have few mentors to learn from, work on toy problems and never change the world.

At Freelancer you’ll get to work on a highly diverse, global set of internet-scale challenges where you will make a meaningful difference with real responsibility, while rapidly building your skills. We run a meritocracy - we actively promote from within.

You’ll also change lives- our mission is to provide one billion jobs. Not many companies actually make a difference like Freelancer does in providing opportunity and income to people all around the world.

This advertiser has chosen not to accept applicants from your region.

DevOps/Site Reliability Engineer - London

London, London Capgemini

Posted 3 days ago

Job Viewed

Tap Again To Close

Job Description

DevOps/Site Reliability Engineer - London Reference Code: 280882-en_GBContract Type: PermanentProfessional Communities: Products & Systems Engineering

At Capgemini Engineering, the world leader in engineering services, we bring together a global team of engineers, scientists, and architects to help the world’s most innovative companies unleash their potential. From autonomous cars to life-saving robots, our digital and software technology experts think outside the box as they provide unique R&D and engineering services across all industries. Join us for a career full of opportunities. Where you can make a difference. Where no two days are the same.

Your role 

As a Site Reliability Engineer on the Engineering Operations team, you will contribute to every phase of the software development lifecycle. In our fast-paced, cloud-native environment, you'll be responsible for scaling and managing AWS-based infrastructure, designing for high availability, and maintaining reliable CI/CD pipelines. You'll work closely with development, security, privacy, and quality teams to deliver secure, scalable, and resilient systems.
 
Our tech stack is centered around AWS, with services built in Kotlin and deployed using modern orchestration and telemetry tools. A strong foundation in distributed systems, observability, and cloud-native design is essential. Experience with AI/ML to enhance observability, automate incident response, and enable self-healing capabilities is a valuable plus.

Your profile

Essential Qualifications

  • Proven experience in operationalizing large-scale, distributed, fault-tolerant, multi-tenant systems in production environments.
  • 5+ years of experience in working with AWS Services, including but not limited to EC2, S3, EKS, DynamoDB, EBS CloudFormation, Lambda, VPC, Route 53.
  • Bring at least 5 years experience operating in core SDLC CI/CD processes, along with SRE concepts - Monitoring, Alerting, Incident management.
  • Experience working within a DevOps operating model, with exposure to data analytics and AI/ML use cases in infrastructure and operations.
  • BS degree in Computer Science or equivalent field.

Preferred Qualifications

  • AWS certifications (e.g., Certified Solutions Architect, Certified Developer).
  • Launched and operated commercial products and services based on AWS (references required), specific examples in FinTech are a strong plus.
  • Published papers or led talks featuring leading multi-functional projects.
  • Familiarity with managing interdependent systems at scale, including both interactive and batch-oriented.
  • Working knowledge of ML and GenAI concepts, including LLM architectures, attention mechanisms, function calling, agentic workflows.
  • Experience with AI/ML in observability and automation, such as proactive monitoring, anomaly detection, and root cause analysis using tools like Datadog, Splunk, or New Relic. Expertise of using frameworks such as LangChain or Spring AI to build AI-powered applications integrated with APIs and vector databases is a big plus.

If you're excited about this role but don’t meet every requirement, we still encourage you to apply, your unique experience could be just what we need

What you’ll love about working here

  • Open access to digital learning platforms

  • Active employee networks promoting diversity, equity and inclusion like OutFront, CapAbility or 

  • A work environment recognized by Ethisphere as one of the World’s most Ethical companies

Need to know

  • All roles will require a level of security clearance; BPSS OR Security Clearance OR Developed Vetting. 

  • You can bring your whole self to work. At Capgemini building an inclusive future is part of everyday life and will be part of your working reality. We have built a representative and welcoming environment, for everyone

#LI-GP5

Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fueled by its market leading capabilities in AI, generative AI, cloud and data, combined with its deep industry expertise and partner ecosystem.

This advertiser has chosen not to accept applicants from your region.
 

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Devops Engineers Jobs View All Jobs in London