Descripción del Trabajo
Manage a team of engineers who build, operate, and continuously improve the solutions in our ecosystem. The role assists in managing team productivity and works to ensure the optimal health of the solutions in our ecosystem by overseeing platform and application performance, resilience, and stability. This role is also an active participant in all aspects of Site Reliability Engineering, including technical vision, telemetry and observation decisions, automation strategy, solution delivery, and platform incident and problem management. This is a leadership role with both technical and people leadership responsibilities. As such, this role participates in short and long-term systems planning, teams and organizational planning.
Production operations is inclusive of all the activities to reliably run our platforms and solutions in production (i.e. DevOps, DataOps, MLOps).
Duties and Responsibilities
- Continuously work to improve the reliability, stability, and performance of the digital platforms by overseeing the implementation of fully automated telemetry, observation, & applied intelligence systems.
- Continuously work to improve problem identification and service restoration of digital platforms by leading and overseeing efforts to define, enhance, and deliver automated alerting and response systems with intelligent, self-healing capabilities.
- Partnering closely with development teams to design, build, deploy, support, and monitor new and existing solutions as well as setting standards to accept new solutions for deployment
- Measuring and optimizing service performance and availability
- Capacity planning and management
- Oversee and report on project status, assemble project teams, and help to define assignments against defined schedules and milestones.
- Exploring and evaluating new technologies and solutions to push our capabilities forward
- Provide periodic on-call escalations support based on established 24/7/365 support schedules.
- Fulfill the role of Escalation Manager/Critical Incident Manager on major incidents by facilitating incident resolutions by leading teams through effective service restoration.
- Collaborate with admins and platform engineers through implementation decisions to achieve highly reliable infrastructure, systems, and integrations.
- Provide advanced Incident Management and Problem Management support to teams, to effectively identify, remediate, and resolve issues related to platform reliability, stability, and performance through careful analysis of telemetry data and system logs.
- Document all changes following controls, procedures and documentation standards and raises issues and concerns with recommendations for follow-up action.
- Drive metrics management and performance of activities with a focus on a data driven continuous improvement approach.
- Participate in the exchange of ideas and information across the global Information Technology organization to help drive standardization and sharing of best practice initiatives.
- Broad understanding of business/operations requirements to create suitable Information Technology solutions.
- Ensure given priorities and deadlines are kept (demand management).
- Be able to handle conflicting priorities with the available internal and external resources.
- Selects, develops, and evaluates personnel to ensure the efficient operation of the application development function.
- Provide resources for project activities to support successful project execution.
- Coordination of work schedules, priorities, and performance management of Information Technology personnel.
- Demonstrate a commitment to customer service; anticipate, meet and exceed expectations by solving problems quickly and effectively, making customer issues a priority.
- Ability to work effectively under pressure with constantly changing priorities and deadlines.
- Understand and embrace the global Information Technology strategic direction.
- Adhere to all safety and health rules and regulations associated with this position and as directed by superior.
- Enforce and follow all procedures and policies within Information Technology and the company.
- Participate in internal audit programs and drive non conformance issues to conclusion.
- Project Management as required.
- May perform other duties and responsibilities as assigned
- Demonstrated leadership and resource management experience
- Experience of working with Information Technology Infrastructure Library (ITIL) processes
- Interpersonal skills in working with the customer, management and technical teams
- Excellent written and verbal communication skills
- Effective presentation skills
- Strong attention to detail
- Experience of delivering development services preferably in a Manufacturing environment
- Advanced level English skills
Education and Experience
- Bachelor's Degree in Information Technology related area preferred.
- Minimum of 5 years work-related experience required in Information Technology or related discipline.
- Minimum of 2 years management experience required.
- Lean Six Sigma qualification preferred.