Saltar al contenido principal
Volver a la búsqueda

IT Technical Lead - Site Reliability Engineering (SRE)

Req ID: J2307947

  • Ubicación
    Livingston, Lothian Occidental, United Kingdom
    Remoto - Austria, Viena, Austria
    Remoto - Bélgica, Región de Bruselas-Capital, Belgium
    Remoto - Alemania, Berlín, Germany
    Remoto - Hungría, Budapest, Hungary
    Remoto - Irlanda, , Ireland
    Remoto - Italia, Roma, Italy
    Remoto - Polonia, Baja Silesia, Poland
    Remoto - España, Madrid, Spain
    Remoto - Reino Unido, Ayrshire, United Kingdom
  • Categoría Grupo de Servicios de Diseño
  • Publicado lunes, 11 de abril de 2022
  • Tipo Tiempo completo
  • Tipo de Empleo Empleado Permanente

Descripción del Trabajo

As a  Site Reliability Engineering (SRE) Lead  you will be responsible for both uplifting and maintaining our evolving technology platforms, infrastructure, and technology controls. This is a hands-on technical leadership role that will include both oversight for production operations of our systems, as well as development/engineering of solutions to maximize system reliability & automation. Your role will include root cause analysis of incidents and proactive prevention of recurrence thru the creative design and development of technical solutions as well as process improvements. You will work with our analytics delivery teams and data platform teams to define production operations standards and KPIs. You will also partner with Infrastructure, Operations and Cloud teams to identify and implement automation opportunities to drive down toil, reduce technical debt and improve system reliability.

Production operations are inclusive of all the activities to reliably run our platforms and solutions in production (i.e. DevOps, DataOps, MLOps).

YOUR RESPONSIBILITIES WILL INCLUDE

  • Continuously work to improve the reliability, stability, and performance of the digital platforms by overseeing the implementation of fully automated telemetry, observation, & applied intelligence systems.
  • Continuously work to improve problem identification and service restoration of digital platforms by leading and overseeing efforts to define, enhance, and deliver automated alerting and response systems with intelligent, self-healing capabilities.
  • Partnering closely with development teams to design, build, deploy, support, and monitor new and existing solutions as well as setting standards to accept new solutions for deployment
  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
  • Partner with development teams to improve services through rigorous testing and release procedures
  • Participate in system design consulting, platform management, and capacity planning
  • Create sustainable systems and services through automation and uplifts
  • Balance feature development speed and reliability with well-defined service level objectives
  • Exploring and evaluating new technologies and solutions to push our capabilities forward
  • Collaborate with admins and platform engineers through implementation decisions to achieve highly reliable infrastructure, systems, and integrations.
  • Document all changes following controls, procedures and documentation standards and raises issues and concerns with recommendations for follow-up action.

Minimum Qualifications

  • Deep understanding of AWS cloud services and how to leverage them for compute, storage, and managed services including, but not limited to databases, managed Kubernetes, and Python application services.
  • Experienced with modern DevOps engineering practices and comfortable with diverse technical problem sets, across the entire technology stack, including the virtualized hardware
  • Possess an understanding of both the Windows and Linux operating systems and are at home on the command line / terminal at your workstation
  • Familiar with configuration automation tools
  • Proficient in scripting and developing automation in Python and bash, or similar programming languages
  • Used to keeping everything you do in source control (git) and automating (scripting) any task you have to do more than once
  • Able to effectively trouble-shoot issues across the entire stack from UI- > API – > Application – > Database, including the operating system and the underlying (virtual) hardware
  • Enthusiastic about cutting-edge technologies and fresh challenges that come with them
  • Possesses service and customer-oriented mindset and a willingness to dig into the application rather than throw the problem over the wall

Ideal Qualifications

  • Experienced using Kubernetes and related technologies (such as AWS EKS) for application orchestration
  • Experience with a wide array of cloud services is required (preferably AWS but Azure is applicable too)
  • Excited about monitoring technologies, the metrics they provide, and using the data to extract information about the performance characteristics, and error modes of a cloud-based software stack
  • Familiarity maintaining and supporting feature-rich applications using modern software frameworks (i.e. C#, AngularJS, Python).
  • Understanding of computer networking and how it applies in cloud environments
  • Related technical experience in cybersecurity, preferably in a cloud environment
  • Experience securing corporate networks, cloud networks, and VPNs.

Work location: Work can be performed remotely from any of our European sites (Hungary, Austria, Belgium, Germany, UK, Spain, Ireland, Italy etc.)

Image 17 (1)

¿No estás listo para aplicar? ¡Únase a la red profesional de Jabil!

Más información sobre las próximas oportunidades profesionales y eventos Jabil

Únete ahora