Descripción del Trabajo
- You are a coder foremost (scripting and compiled languages), experienced in DevOps--be it using tools or manual implementation.
- You have enough Machine Learning knowledge to be able to run and deploy ML code in a variety of environments (but you're not a data scientist!)
- You enjoy creating services and systems like applications and data pipelines from “scratch,” trouble-shoot and maintain them.
- You understand data transformation capabilities for a variety of solutions and utilize existing services, platforms, or even better, a greenfield solution.
- You speak Docker and have build/deploy experience and familiarity with infrastructure as code.
- You pride yourself on being a jack of all trades and bring ingenuity to the table, stepping in to keep a project moving forward and no fear of picking up new skills and knowledge along the way.
In this role, you will assist Data Engineers and Data Scientists by solutioning environment resources; create services and workflow solutions for Solution Architects and PMs; and assist as a developer when there are gaps in coding skillsets for project development.
You will work closely with the architects to implement their vision and help champion it.
Relevant experience and skills will include:
OpenSource SDKs/Libraries: Commonly used libraries in Machine Learning: AzureML, AWS (Amazon Web Services) boto3, scikit-learn, pandas, numpy, tensor-flow, PyTorch. Graphing libraries such as: Matplotlib, Dash, Shiny R, Plotly. ML API/Estimators for AzureML and AWS Sagemaker.
Dev/Ops: Azure DevOps deployments, AWS CodeDeploy, AWS Code Pipeline, manual code deploys – scp/secure copy protocol, putty, BitVise, WinSCP. Ability to debug software remotely. Some system administrator skills using command line: ssh, vim, Unix, bash scripting, aws cli, MS Powershell etc.
Docker : Experience in creating/modifying Dockerfiles, familiarity with docker-compose, yaml configuration files. Experience using container orchestrators for services deployment; Kubernetes, DC/OS, Mesos, ECS, Fargate, Swarm. Familiarity with Docker registries. Some knowledge of GPU vs CPU options. Nice to have - used Helm Charts (pkg manager), Docker monitoring with Grafana, Prometheus, cAdvisor, etc.
Data Storage/Sinks: Familiar with various storage solutions: blob/object storage like S3 and Blob, Data Lake, Data Warehouse, HDFS, relational databases (SQL Server, Oracle, MySQL, PostGres) and noSQL (such as Solr, ElasticSearch, DynamoDB, CosmosDB), use of JSON as data format. Familiar with other data formats: parquet, csv, avro, json, orc, etc.
Resource Security: Able to utilize AWS Iam roles/Azure RBAC users, bucket policies, resource policies, VPC security groups (inbound/outbound rules), NACLs. (Azure equivalence of some of these things.) Identity and Authorization services: OAuth, Okta, AWS Cognito.
Serverless Technologies: development of serverless functions like Lambda or Azure function, API Gateway, DynamoDB, CloudFormation templates, The Serverless Framework - free and open-source web framework written using Node.js or (AWS SAM) - AWS Serverless Application Model for AWS specific resources.
Version Control Software: Git, CVS, SVN, Mercurial or other tools.
Capabilities: Data transformation/ETL processing, data streaming, web application creation, microservices, decent SQL query/DDL scripting, python coding in jupyter notebook and scripts. Quick PoC skills. Other big programming capabilities with compiled languages - using Swagger/RAML endpoint definitions for REST service creation. Fast scripting language application development – like NodeJs or Python/Flask/Django or Go/Lang or Ruby or PHP with frontend Angular/ReactJS, bootstrap js/css etc.
Other nice to have technologies: Graph database/Neo4J, streaming tools and frameworks: Kafka/Confluent, Kinesis, EventHub, Spark, IoT streaming, pub-subs/message brokers: RabbitMQ, Redis, AWS SQS. Code as infrastructure like Ansible, CloudFormation. Middleware services like GraphQL and custom APIs, use of RAML/Swagger/Open API. MLOP frameworks such as Apache Airflow, Apache Beam, and Kubeflow.
Education and Experience:
Bachelor’s in Computer Science or equivalent combination in education, training and experience.
Minimum 4 years' experience in computer programming front-end and back-end services.
Minimum 2 years' experience Docker development and using orchestration frameworks or services.
Minimum 2 years’ DevOps experience - manual deployment and using tools/code.
Minimum 2 years’ Cloud Engineering (AWS, Azure or Google) utilizing infrastructure as code, SDKs and APIs.
Minimum 3 years’ Data Engineering experience – include Relational and noSQL database CRUD operations, querying and data transformation. Data streaming experience is a plus.
Work location: REMOTE from Mexico OR Hungary OR Poland OR Ukraine