Build a team of DevOps and system engineers
• Manage the data production environment - where we store all our data and train our algorithms : fully containerized environment, with PB+ storage, Ceph FS cluster, tens of GPU’s, thousands of cores, many DB’s and more…
• Make our environment scalable, our apps and systems available, our provisioning self-service, and data read/write blazing fast.
• Shorten time from Dev. to Production - turn every commit into a fully is packaged app, and with one click deployed to production.
• Scale our Machine learning research (“ResearchOps”) - maximize our capacity to run experiments and train on huge amounts of data, while keeping all algorithms results traceable in dashboards.
• Create robust production environments - deployment, monitoring, alerting, logging and other tooling - both on cloud and on premise.
• 4+ years as DevOps in a modern software company
• 2+ years as team leader
• Fluent in Linux
• 1+ years with Docker in production
• Experience building/supporting a large distributed production environment
• Deep understanding in hardware - from network equipment to storage devices
• Enjoys the combination between reactive / supporting users, with taking initiative, and proactively defining and executing solutions to bottlenecks and gaps.
Anything from our technology list is a big plus:
CephFS, MaaS, Docker, Rancher, Kubernetes, GPU, GitLab, Prometheus, Node.js, Mongo DB, Postgress, ElasticSearch, Kibana, Redash, TensorFlow, Python, Node.JS and more….