As a Site Reliability Engineering (SRE) Lead you will be responsible for both uplifting and maintaining our evolving technology platforms, infrastructure, and technology controls. This is a hands-on technical leadership role that will include both oversight for production operations of our systems, as well as development/engineering of solutions to maximize system reliability & automation. Your role will include root cause analysis of incidents and proactive prevention of recurrence thru the creative design and development of technical solutions as well as process improvements. You will work with our analytics delivery teams and data platform teams to define production operations standards and KPIs. You will also partner with Infrastructure, Operations and Cloud teams to identify and implement automation opportunities to drive down toil, reduce technical debt and improve system reliability.
Production operations are inclusive of all the activities to reliably run our platforms and solutions in production (i.e. DevOps, DataOps, MLOps).
YOUR RESPONSIBILITIES WILL INCLUDE
- Continuously work to improve the reliability, stability, and performance of the digital platforms by overseeing the implementation of fully automated telemetry, observation, & applied intelligence systems.
- Continuously work to improve problem identification and service restoration of digital platforms by leading and overseeing efforts to define, enhance, and deliver automated alerting and response systems with intelligent, self-healing capabilities.
- Partnering closely with development teams to design, build, deploy, support, and monitor new and existing solutions as well as setting standards to accept new solutions for deployment
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplifts
- Balance feature development speed and reliability with well-defined service level objectives
- Exploring and evaluating new technologies and solutions to push our capabilities forward
- Collaborate with admins and platform engineers through implementation decisions to achieve highly reliable infrastructure, systems, and integrations.
- Document all changes following controls, procedures and documentation standards and raises issues and concerns with recommendations for follow-up action.
- Deep understanding of AWS cloud services and how to leverage them for compute, storage, and managed services including, but not limited to databases, managed Kubernetes, and Python application services.
- Experienced with modern DevOps engineering practices and comfortable with diverse technical problem sets, across the entire technology stack, including the virtualized hardware
- Possess an understanding of both the Windows and Linux operating systems and are at home on the command line / terminal at your workstation
- Familiar with configuration automation tools
- Proficient in scripting and developing automation in Python and bash, or similar programming languages
- Used to keeping everything you do in source control (git) and automating (scripting) any task you have to do more than once
- Able to effectively trouble-shoot issues across the entire stack from UI- > API – > Application – > Database, including the operating system and the underlying (virtual) hardware
- Enthusiastic about cutting-edge technologies and fresh challenges that come with them
- Possesses service and customer-oriented mindset and a willingness to dig into the application rather than throw the problem over the wall
- Experienced using Kubernetes and related technologies (such as AWS EKS) for application orchestration
- Experience with a wide array of cloud services is required (preferably AWS but Azure is applicable too)
- Excited about monitoring technologies, the metrics they provide, and using the data to extract information about the performance characteristics, and error modes of a cloud-based software stack
- Familiarity maintaining and supporting feature-rich applications using modern software frameworks (i.e. C#, AngularJS, Python).
- Understanding of computer networking and how it applies in cloud environments
- Related technical experience in cybersecurity, preferably in a cloud environment
- Experience securing corporate networks, cloud networks, and VPNs.
Work location: Work can be performed remotely from any of our European sites (Hungary, Austria, Belgium, Germany, UK, Spain, Ireland, Italy etc.)
If you are a qualified individual with a disability, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access Jabil.com/Careers site as a result of your disability. You can request a reasonable accommodation by sending an e-mail to Always_Accessible@Jabil.com or by calling 1.727.803.7515 with the nature of your request and contact information. Please do not direct any other general employment related questions to this e-mail or phone number. Please note that only those inquiries concerning a request for reasonable accommodation will be responded to from this e-mail address and/or phone number.