56 Site Reliability Engineer jobs in the United Kingdom
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Senior Site Reliability Engineer
London - Hybrid
80,000 - 90,000 + 38 Days Holiday + Private Healthcare + Life Assurance + Flexible Working + Pension
Excellent opportunity for Site Reliability Engineer to join a forward-thinking and high-growth technology company offering a Hybrid work environment, a great benefits, and opportunities for further progression!
This company operates at the forefront of digital transformation, delivering a unified platform built for scalability, resilience, and performance. With a strong culture rooted in integrity, creativity, and technical excellence, they've become a trusted partner across global industries.
In this role you'll take ownership of platform reliability, resilience engineering, and incident management across cutting-edge cloud infrastructure. You'll play a key role in ensuring uptime, performance, and continuous improvement of core systems.
The ideal candidate will be an experienced Site Reliability Engineer with a deep background in AWS, Kubernetes (EKS), Terraform, and monitoring/eventing tools. You'll have a strong grasp of application-level troubleshooting, chaos engineering, and performance tuning.
This is a fantastic opportunity to work in a modern DevOps environment where innovation is encouraged, personal development is supported, and technical impact is real.
The Role:
*Manage and optimise AWS and Kubernetes (EKS) infrastructure
*Implement resilience strategies and conduct chaos engineering experiments
*Monitor and maintain Kafka clusters for performance and reliability
*Respond to and resolve application-level production incidents
The Person:
*5+ years in SRE, DevOps, or infrastructure engineering
*Strong experience with AWS, EKS/Kubernetes, and Terraform
*Familiar with Kafka and observability tools like Datadog or Grafana
*Able to troubleshoot issues across infrastructure and application layers
Reference number: BBBH(phone number removed)
To apply for this role or for to be considered for further roles, please click "Apply Now" or contact Tommy Williams at Rise Technical Recruitment.
Rise Technical Recruitment Ltd acts an employment agency for permanent roles and an employment business for temporary roles.
The salary advertised is the bracket available for this position. The actual salary paid will be dependent on your level of experience, qualifications and skill set. We are an equal opportunities employer and welcome applications from all suitable candidates.
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Your new company and role
A leading public sector organisation is undergoing a major shift from on-premise systems to cloud-based services. As a Site Reliability Engineer (SRE), you'll join a collaborative, agile team focused on enhancing platform resilience, automation, and observability.
You'll work across a modern tech stack, including RHEL, Ansible, Oracle, AWS, and container platforms like OpenShift and Kubernetes, playing a key role in ensuring service continuity and disaster recovery readiness.
What you'll need to succeed
To thrive in this role, you'll bring:
- Strong Unix/Linux expertise, particularly with RHEL 7/8/9 and Red Hat Satellite.
- Automation skills, including Ansible, shell scripting (Bash/Perl), and infrastructure-as-code principles.
- Containerisation experience, with Docker and Kubernetes/OpenShift.
- CI/CD knowledge, including pipeline configuration and Git-based workflows.
- Monitoring and observability tools, such as Prometheus, Grafana, InfluxDB, and Nagios.
- Cloud proficiency, especially with AWS services (EC2, S3, VPC, NLB) and automation tools like Terraform or CDK.
- Desirable skills include experience with MongoDB, Python, CommVault, Oracle virtualisation (KVM/LVM), and AWS EKS.
What you need to do now
If you're interested in this role, click 'apply now' to forward an up-to-date copy of your CV, or call us now.
If this job isn't quite right for you, but you are looking for a new position, please contact us for a confidential discussion about your career.
Hays Specialist Recruitment Limited acts as an employment agency for permanent recruitment and employment business for the supply of temporary workers. By applying for this job you accept the T&C's, Privacy Policy and Disclaimers which can be found at (url removed)
Site Reliability Engineer

Posted 1 day ago
Job Viewed
Job Description
At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward - always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities.
**The Role**
Join us as a Site Reliability Engineer (SRE) and embark on an exciting journey of ensuring reliability, resiliency, and innovation in our information systems and ecosystems. As an SRE at Kyndryl, you'll be at the forefront of driving continuous improvement and delivering exceptional service to our customers.
Your role goes beyond traditional engineering, as you'll have the opportunity to analyze business needs, tackle complex problems, and provide strategic advice and designs. You'll be involved in every stage of the software lifecycle, from building and testing to deploying changes and maintaining robust systems.
We're looking for a true visionary who can think strategically and help shape the future of our services. Your expertise in building trusted relationships with customers and partnering with them for success will be instrumental in driving our growth.
As an SRE, you'll have the unique opportunity to work on end-to-end services, spanning customer sites and platforms. Collaboration and proactivity are key as you work alongside a talented team of professionals, eager to make a difference. You'll embrace an entrepreneurial mindset, taking ownership of your responsibilities and constantly seeking innovative solutions.
With an unwavering focus on quality, robustness, and security, you'll be a driving force in implementing cutting-edge tools that enhance our operations, improve reliability, and gather valuable feedback on our platforms. Your ability to identify and mitigate common operational issues will play a crucial role in delivering seamless experiences to our customers.
If you're passionate about pushing the boundaries of technology, thrive in a collaborative environment, and are motivated by the opportunity to shape the future of reliability engineering, then we want to hear from you. Join our team and be part of a dynamic and forward-thinking organization that values innovation and excellence in everything we do.
Your Future at Kyndryl
Kyndryl has a global footprint, which means that as a Site Reliability Engineer at Kyndryl you will have opportunities to work on projects and collaborate with colleagues from around the world. This role is dynamic and influential - offering a wide range of professional and personal growth opportunities that you won't find anywhere else.
**Who You Are**
You're good at what you do and possess the required experience to prove it. However, equally as important - you have a growth mindset; keen to drive your own personal and professional development. You are customer-focused - someone who prioritizes customer success in their work. And finally, you're open and borderless - naturally inclusive in how you work with others.
Product Engineering
To design, manage and support the introduction of new Products / Services to LBG through new technology, Patterns & Blueprints. This involves liaising with the customer to draw out and define the actual requirements. To design a solution that meets the requirements and to call out any technical issues. The Engineers are expected to liaise with and keep aligned to industry best practise for the use of their appropriate product stacks.
The role includes engaging stakeholders from all levels of both Managed and Project Services teams to ensure that they are aware of the way we consume various Products and what new skills are needed to be rolled out because of any new services implemented.
The Engineer must be able to represent the details of the solutions through any appropriate governance and discuss and agree the baselines for security compliance of the Product Offerings.
Lastly the Engineer is expected to provide L4 support as and when needed to the Managed Service teams and whilst they are not expected to be on call they are requested if possible to assist out of normal hours in the event of a major customer issue.
These are the base skills requested.
Skills
- Experience of Supporting / Management of MQ covering both current and past versions.
- Experience of designing solutions for MQ / RDQM including demonstrating working around technical issues to deliver the solution.
- Knowledge of managing, supporting and designing solutions for any associated clustering that their product offers.
- Knowledge of Automation tools such as VRA / VRO, Ansible.
- Knowledge of Backup / Restore processes.
- Knowledge of both Virtualisation and Appliance deployments.
- Defect Management.
- Storage / SAN.
- Technical Leadership.
- Technical Fault Fixing level 3/4.
- Networking (DHCP / WINS / DNS).
- Chef.
- VCenter / vSphere.
- Active Directory.
- Creation / Management of Roadmaps.
- Product Management (Documentation, Governance, Stake Holder Management, Customer Management).
**Being You**
Diversity is a whole lot more than what we look like or where we come from, it's how we think and who we are. We welcome people of all cultures, backgrounds, and experiences. But we're not doing it single-handily: Our Kyndryl Inclusion Networks are only one of many ways we create a workplace where all Kyndryls can find and provide support and advice. This dedication to welcoming everyone into our company means that Kyndryl gives you - and everyone next to you - the ability to bring your whole self to work, individually and collectively, and support the activation of our equitable culture. That's the Kyndryl Way.
**What You Can Expect**
With state-of-the-art resources and Fortune 100 clients, every day is an opportunity to innovate, build new capabilities, new relationships, new processes, and new value. Kyndryl cares about your well-being and prides itself on offering benefits that give you choice, reflect the diversity of our employees and support you and your family through the moments that matter - wherever you are in your life journey. Our employee learning programs give you access to the best learning in the industry to receive certifications, including Microsoft, Google, Amazon, Skillsoft, and many more. Through our company-wide volunteering and giving platform, you can donate, start fundraisers, volunteer, and search over 2 million non-profit organizations. At Kyndryl, we invest heavily in you, we want you to succeed so that together, we will all succeed.
**Get Referred!**
If you know someone that works at Kyndryl, when asked 'How Did You Hear About Us' during the application process, select 'Employee Referral' and enter your contact's Kyndryl email address.
Kyndryl is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, pregnancy, disability, age, veteran status, or other characteristics. Kyndryl is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
Site Reliability Engineer

Posted 1 day ago
Job Viewed
Job Description
Come join a team that is striving for operational awesomeness and trying to automate the world. We have a large presence with large cloud vendors. You should have experience with architecture, deployments, and networking in one or more of the major industry vendors. This is an incredible opportunity to use your existing cloud experience and drive the growth of Splunk Cloud.
**What we're looking for**
**NOTE:** **4 x 10h shifts: Wednesday - Saturday/8am-6pm**
We are looking for a TechOps SRE to help maintain, contribute to and improve the next generation of our large scale Cloud offering. You will be working with providers and supporting the infrastructure that powers Splunk's cloud offering.
**You should apply if**
+ **you are comfortable working 4 x 10h shifts: Wednesday - Saturday/8am-6pm**
+ You have operational experience at scale. You have had hands-on roles that deal with operating systems (particularly Linux) and networking. You might also have worked with Cloud technologies. Your previous job titles might be something close to systems admin, network engineer or devops engineer.
+ You're passionate about your work. Our customers are passionate about Splunk and we want the same from our engineers. You should enjoy actively being responsible for your work and be excited about your projects.
+ You love large complex systems. Experience in working on distributed systems or a passion for finding edge cases that appear at scale. You are interested in how to bring something from a small one off task to how to implement it across several thousand machines at once.
+ You have some development skills. We have code in several languages, ranging from Python and Shell to Go and C++. We don't expect you to be a software engineer but you should be familiar with basic programming and understand concepts like input sanitisation and unit testing.
+ "How can I automate this process?" is a question you constantly ask yourself.
+ Data drives your decisions. Data excites you and you make decisions based on numbers rather than assumptions. If an issue arises, you strive to be alerted before our customers notice.
+ You care about monitoring. Shipping code often and getting useful feedback excites you and you're not worried about changing direction when a solution isn't working as expected.
**What we provide**
+ Opportunities to develop and grow as an engineer. We are always expanding into new areas, working with open-source projects and contributing back, and exploring new technologies.
+ A team of incredibly capable and dedicated peers, all the way from engineering to product management and customer support.
+ Breadth and depth. You are interested to work in an area that dynamically scales to meet the need of Splunk's cloud offering. You want to go deep into optimizing how we automate every manual process and tedious task we encounter.
+ Growth and mentorship. We believe in growing engineers through ownership and leadership opportunities. We also believe that mentors help both sides of the equation.
+ A stable, collaborative, and supportive work environment. Honesty and collaboration are values we see as a core part of our team identity. We understand the value in open communication-working together to get things done, and to adapt to the changing needs of the team and individuals. This is reflected in both our internal communications and also in how we interact with our customers.
+ Balance. We don't expect people to work 12 hour days. We want you to be successful outside of work too. We trust our colleagues to be responsible with their time and commitment, and believe that balance helps cultivate a positive environment.
Splunk, a Cisco company, is an Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis.
Site Reliability Engineer
Posted 27 days ago
Job Viewed
Job Description
Location: Remote
Salary: £55,000 - £5,000
About us
At Arbor, we’re on a mission to transform the way schools work for the better.
We believe in a future of work in schools where being challenged doesn’t mean being burnt out and overworked. Where data guides progress without overwhelming staff. And where everyone working in a school is reminded why they got into education every day.
Our MIS and school management tools are already making a difference in over 7,000 schools and trusts. Giving time and power back to staff, turning data into clear, actionable insights, and supporting happier working days.
At the heart of our brand is a recognition that the challenges schools face today aren’t just about efficiency, outputs and productivity - but about creating happier working lives for the people who drive education everyday: the staff. We want to make schools more joyful places to work, as well as learn.
About the role
We are looking for an enthusiastic and proactive Site Reliability Engineer to join our SRE team and help us ensure we provide world-class resilience and performance across the platform. The remit and focus of the role is to advise on all aspects of site reliability including availability, scalability, observability and capacity planning. It’s a broad and exciting role, so we’re looking for someone up for a challenge - if you’re an energetic and a collaborative Site Reliability Engineer, this is the role for you.
- Proactively monitor and analyse platform performance.
- Collaborate with engineering teams to address performance bottlenecks and ensure scalability.
- Assist engineering teams with implementing and reviewing SLOs
- Continually improve observability through monitoring and alerting, and dashboards, using tools such as DataDog or Prometheus for example.
- Work with other teams to ensure it is effective and provides full coverage.
- Ensure the service is highly available and resilient
- Champion best practices in design for high availability
- Devise runbooks and run game sessions to test our DR plan, H/A and backups
- Conduct assessments of capacity and plan for scaling to meet current and future business needs.
- Work closely with the Head of Platform Engineering and Head of SRE to strategize and implement scalable solutions.
- Work closely with the Platform team, feature teams and, 2nd line support and other stakeholders to ensure a good level of service is provided for our customers and embed SRE practices.
- Key player in the response and troubleshooting of incidents, ensuring rapid resolution and minimising downtime.
- Participate in blameless postmortems to identify root cause and corrective actions
- Develop and maintain playbooks and documentation
Requirements
About you- Experience in performance monitoring and analysis
- Capacity planning experience
- Scripting and automation skills, with experience in relevant technologies.
- Experience with Infrastructure as Code, in particular, Terraform
- Understanding of relational database technologies and their cloud versions (e.g. AWS Aurora)
- Experience with messaging and distributed asynchronous workloads
- Experience with nginx or similar technologies
- Familiarity with SRE processes.
- Aware of DevOps principles like the 3 ways and 5 ideals.
Bonus Skills
- Experience with other database technologies and cloud platforms.
- Past experience with enterprise solutions running at scale
- Familiarity with kanban and agile development processes
- Experience with containerisation, for example Docker
- Familiarity with software best practices such as Refactoring, Clean Code, Domain-Driven Design and Test-Driven Development.
Benefits
What we offer
The chance to work alongside a team of hard-working, passionate people in a role where you’ll see the impact of your work everyday. We also offer:
- A dedicated wellbeing team who champion initiatives such as mindfulness, lunch n learns, manager training, mental health first aid training and much more!
- 32 days holiday (plus Bank Holidays). This is made up of 25 days annual leave plus 7 extra company wide days given over Easter, Summer & Christmas
- Life Assurance paid out at 3x annual salary
- Comprehensive wellness benefit provided by AIG Smart Health, which provides a 24/7 virtual GP service, Mental health support, Counselling, and personalised Health Checks
- Private Dental Insurance with Bupa
- Salary sacrifice Pension provided by Scottish Widows
- Enhanced maternity and adoption leave (20 weeks full pay) and paternity (6 weeks full pay) pay
- 5 free return to work maternity coaching sessions, helping you adapt to this new exciting time of life!
- Access to services such as Calm and Bippit (financial wellbeing coaching)
- All of our roles champion flexible working and we are happy to discuss what this means to you
- Social committees that plan team, office and company wide events to bring people together and celebrate success
- Dedicated professional development training budget (CPD courses, upskilling resources, professional memberships etc)
- Volunteer with a charity of your choice for a day each year
- Dog friendly offices!
Interview process
- Phone screen
- 1st stage
- 2nd stage
We are committed to a fair and comfortable recruitment process, so if you require any reasonable adjustments during your application or interview process, please reach out to a member of the team at .
Our commitment is also backed by our partnership with Neurodiversity Consultancy, Lexxic who provide us with training, support and advice.
Arbor Education is an equal opportunities organisation
Our goal is for Arbor to be a workplace which represents, celebrates and supports people from all backgrounds, and which gives them the tools they need to thrive - whatever their ambitions may be so we support and promote diversity and equality, and actively encourage applications from people of all backgrounds.
Refer a friend
Know someone else who would be good for this role? You can refer a friend, family member or colleague, if they are offered a role with Arbor, we will say thank you with a voucher valued up to £200! Simply email:
>Please note: We are unable to provide visa sponsorship at this time.
Site Reliability Engineer
Posted 531 days ago
Job Viewed
Job Description
Opportunity
This is an opportunity for a software professional to take a key role in ensuring seamless and reliable access to our Quantum Computers for a global audience. This role would be an ideal opportunity for someone with a strong Linux system administration or DevOps background to make the move into their first Site Reliability Engineer role.
The successful candidate will join a dynamic and supportive team and be empowered to design & develop new tools and automation processes that streamline operations and enhance system reliability in one of the most cutting edge areas of the technology sector.
This is not a 9-5 role and will require the post holder to be part of an on-call rota to ensure we provide 24hr support.
Remuneration + Benefits
- £60-70k per annum
- Private medical insurance
- Group life and group income protection
- Gym and wellness benefits
- EAP cash plan
- Cycle to work scheme
- 25 days holiday
- Pension
- Employee Stock Ownership Plan (ESOP)
- Hybrid working
The Role
As a Site Reliability Engineer, you will be focusing on building and operating reliable systems and services that ensure high availability, resilience, and scalability for our global network for Quantum Computers. You will also be responsible for monitoring, diagnosing and controlling distributed production environments to avoid manual intervention, while ensuring compliance with high security standards and regulations.
In this role you will solve complex problems related to infrastructure, cloud services & quantum compilation processes, also build automation to avoid manual intervention and prevent problem recurrences. You will actively improve our operation capabilities and hot-fix issues autonomously wherever possible and be part of a support-system, enabling round-the-clock highly responsive support. You will build an outstanding code-level knowledge of all our production products and will be able to work in and out of these teams, steering these products for observability and reliable operations.
This role would be suited either to someone with SRE experience or an experienced Linux/ DevOps / Cloud engineer interested in moving into an SRE role.
- Ensuring high security standards
- Maintain scalability and availability of our QCaaS system
- Improving technical readiness levels
- Be active on development teams
- Building and operating operation systems
- Monitoring and diagnosing issues
- Testing software in development (including destructive testing)
- Performing roll-backs of software when issues arise.
- Solve complex problems related to infrastructure, cloud services & quantum compilation processes
Skills + experience
Required Skills and experience
- DevOps Engineering experience
- Software Developer/ Engineering experience
- Maintenance of cloud infrastructure
- Relevant commercial experience
- Experience in Docker/Kubernetes and Azure/AWS
- Strong knowledge of GitHub CI/CD process for Python applications
- Strong knowledge database such as Postgres and PL/SQL
- Proficient with commonly used networking protocols such as TCP/IP, HTTP
- Technical communicator
- Strong troubleshooting and performance tuning skills.
- Autonomous worker, with the confidence to take on tasks when the rest of the team is unavailable e.g. outside office hours
- Experience in Automation of testing / monitoring tools
- Team player
Desired Skills
- Experience managing on-premises environments would be highly beneficial
- Experience developing new tools from the ground up
- Experience of on-call positions
- Degree in related field (Computing, numerical, scientific)
Research has shown that women are less likely than men to apply for this role if they do not have solid experience in 100% of these areas. Please know that this list is indicative and that we would still love to hear from you even if you feel you only are a 75% match. Skills can be learnt, diversity cannot.
Our Company
At OQC, we see a brighter future for all, enabled by quantum.
Together we are pioneering cutting-edge quantum computers that unlock transformative discoveries, from advancing drug modelling to revolutionising battery technology. Our mission is to put quantum in the hands of humanity, empowering customers to discover new commercial and scientific frontiers.
When you join OQC, you become part of a diverse team of innovators, creators, and problem solvers. We bring together some of the brightest minds in quantum physics, nanotechnologies, hardware, software and commercial operations. Each team member brings a unique skill set and are united by our values, which guide us in everything we do - how we work, how we collaborate and how we shape the future of our industry.
Are you ready to help us build this future?
APPLY NOW!
Please use the link provided to apply for the role of Site Reliability Engineer. To aid your application, it will be beneficial to provide us with a cover letter outlining why you think you would be a good fit for the role and what attracts you to OQC. We look forward to hearing from you!
At OQC we are not just hoping you’ll fit in our culture. We aspire to thrive, as a company and as people, thanks to your diversity of thought and background. We are proud to be an equal opportunity employer and we are committed to providing our team members with a work environment free from discrimination, where everyone is treated with respect. Our employment decisions are based on business needs, talent and merit and all our colleagues share in the responsibility for fulfilling our commitment to diversity. We look forward to meeting you!
Site Reliability Engineer
Posted 560 days ago
Job Viewed
Job Description
Be The First To Know
About the latest Site reliability engineer Jobs in United Kingdom !
Lead Site Reliability Engineer
Posted today
Job Viewed
Job Description
Lead Site Reliability Engineer
Location: Hybrid – Morley office with homeworking
Package: £70,000 - £0,000 - k car allowance, up to 30% bonus, 26 days holiday + flexible benefits
A large-scale tech-driven organisation is looking for a Lead Site Reliability Engineer with a strong focus on leadership and team management . Around 70% of this role is about building, mentoring and directing a high-performing SRE team, setting strategy and driving operational excellence. The remaining 30% will be hands-on involvement in AWS-based platforms, automation and performance tuning.
Key Responsibilities
-
Lead and develop a team of SRE engineers, setting priorities, providing coaching and creating a culture of reliability and continuous improvement
-
Define and own SRE strategy, standards and ways of working across the organisation
-
Collaborate with engineering, operations and product teams to ensure seamless delivery and robust systems
-
Oversee system reliability, availability and performance across large, business-critical platforms
-
Provide technical guidance on automation, monitoring, scalability and incident management
-
Support shared CI/CD services (Jenkins, GitLab, Concourse) and ensure AWS platforms meet operational best practice
-
Produce regular reporting and communicate clearly with senior stakeholders
Key Requirements
-
Strong experience managing or leading engineering/SRE/DevOps teams in a complex environment
-
Track record of mentoring, coaching and growing technical teams
-
Excellent stakeholder engagement skills with the ability to influence at all levels
-
Broad technical knowledge: AWS (certified), CI/CD tools, infrastructure as code (CloudFormation/Ansible), containers (ECS or Kubernetes), scripting (Bash/PowerShell) and at least one higher-level language such as Python or TypeScript
-
Background in Windows and/or Linux administration; Java exposure a bonus
Apply now to speak with VIQU IT in confidence. Or reach out to Aaron Chiverton via the VIQU IT website or at (url removed)
Do you know someone great? We’ll thank you with up to £1,0 if your referral is successful (terms apply).
For more exciting roles and opportunities like this, please follow us on LinkedIn @VIQU IT Recruitment
Senior Site Reliability Engineer
Posted today
Job Viewed
Job Description
Senior Site Reliability Engineer
Central London (Hybrid)
Up to 100k + Car Allowance & Bonus
TRIA are working with a leading hospitality client to hire a Senior SRE, where they are investing heavily in the performance, stability, and reliability of its digital platforms.
This is a hands-on leadership role - you won't just guide others, you'll be the go-to expert when systems are under pressure. You'll lead incident response, own root cause analysis, and solve performance issues like memory leaks, outages, and flaky services.
You will take ownership of the site reliability and drive that as a discipline.
Your focus will include:
- Leading incident management, post-mortems, and blameless RCAs
- Building scalable, resilient microservices with the dev teams
- Uplifting observability
- Improving alerting, monitoring, and system-level metrics
- Driving better SLOs, SLIs, and overall uptime
What you'll bring:
- Experience in high-traffic digital or eCommerce platforms
- 5+ years in SRE/DevOps roles; strong background in incident response
- Observability, automation, and infrastructure as code expertise
- Leadership skills - mentoring others or leading from the front
The stack includes Kubernetes, Terraform, AWS, Python, and modern CI/CD tools, and it's evolving.
If you understand what a good SRE practice looks like, and want to leave systems in a better place than you found them, please apply to be considered and learn more!
Lead Site Reliability Engineer
Posted today
Job Viewed
Job Description
Lead Site Reliability Engineer
Hybrid/ Remote – Once a month requirement in Leeds.
Up to £80,000 per annum plus car allowance plus bonus.
VIQU have partnered with a leading company within the supply chain industry who are seeking a Lead Site Reliability Engineer (AWS) to join and mentor their growing team. This position will lead a team who is responsible for ensuring the reliability of cloud system and enhancing the organisations cloud infrastructure.
This role is mostly remote, with monthly travel required to Leeds.
Responsibilities of the Lead Site Reliability Engineer:
- Lead a team of four SRE’s, helping to maintain the stability of cloud platforms.
- Take on hands on technical responsibilities within AWS, utilising a range of cloud technologies (CI/CD, Container Orchestration, IaaS, Scripting etc.).
- Design and implement scalable and reliable systems.
- Support services in development, testing and production environments (Gitlab, Concourse, Jenkins etc.)
- Sit on the Centre of Excellence (CoE) team, providing suggestions for best practises.
Requirements of the Lead Site Reliability Engineer:
- Must have at least a years’ experience in managing technical teams, and over five years of experience in a hands on, technical SRE/Dev Ops Engineer role.
- Experience with CI/CD tools (Jenkins and Concourse CI ideally).
- Must hold experience within AWS and hold relevant AWS certifications (SA1, DOP-C02 for example).
- Experience with ECS/Kubernetes.
- Experience with infrastructure as a code and config management tools.
- Sctipting experience (PowerShell, Bash ect.).
- Experience with either Python or Typescript as well as knowledge of Java.
- Ideally have set up a Centre of Excellence (CoE) team before.
Lead Site Reliability Engineer
Hybrid/ Remote – Once a month requirement in Leeds.
Up to £0,000 per annum plus car allowance plus bonus.
To discuss this exciting opportunity in more detail, please APPLY NOW for a no obligation chat with your VIQU Consultant. Additionally, you can contact Jack Mcmanus , by exploring the VIQU IT Recruitment website .
If you know someone who would be ideal for this role, by way of showing our appreciation, VIQU is offering an introduction fee up to £1,000 /strong> once your referral has successfully started work with our client (terms apply).