Deimos is a Cloud-native Developer and Security Operations technology services company. We help companies of all sizes adopt the Cloud for improved service delivery to their clients. We�re a fully remote African-based team of engineers who are passionate about implementing engineering best practices. We leverage the latest technologies while building globally competitive solutions for our clients. With Deimos being one of the two moons of Mars, we refer to ourselves as �Martians� who are on a mission to Mars, together.
Role Overview
- We are looking for an experienced Principal Site Reliability Engineer to join our Professional Services team and deliver Software and DevSecOps projects. You will report to a Site Reliability Engineering Manager. As a Principal Site Reliability Engineer you will be expected to fill the role of a technical lead on multiple projects simultaneously, representing the senior technical leadership within our organisation
- SRE / DevOps is one of our core competencies. You will be part of a highly-skilled team that continuously innovates and delivers high value solutions to clients across various industries on all public clouds (AWS, Azure, GCP, etc). Technologies we work with daily include Kuberenetes, Helm, Terraform, GitOps, OPA, Calico, Linkerd, just to name a few.
What you will be doing
- Design and build advanced cloud-native infrastructure
- Guide technical discussions with clients and build technical roadmaps
- Collaborate with the Engineering Director(s) to (re)design architecture
- Assist the Site Reliability Manager with resource planning
- Assist engineering managers with building career paths for individuals wishing to be promoted to Principal Engineers
- Teach, mentor, grow, and provide advice to other domain experts, individual contributors, and across several teams.
- Document processes and monitor performance metrics
- Guide conversations to remove blockers and encourage collaboration across teams.
- Constantly improve the stability, scalability, security, cost-effectiveness, and operational excellence of our clients' systems.
- Continuously discover, evaluate, and implement new technologies to maximize development efficiency and security.
- Conduct infrastructure planning, testing, and development
- Provide technical leadership on multiple projects.
What you must have
- At least 7 or more years experience working in a DevOps/SRE team
- Extensive experience in DevOps/SRE, team management and collaboration
- Advanced knowledge of best practices related to data encryption and cybersecurity
- Advanced knowledge of the general DevOps/SRE landscape, architectures, and emerging technologies
- Cloud experience, preferably GCP, Azure and AWS
- Experience in Observability Practices and Incident Management
- Extensive experience with Prometheus, Grafana, the Elastic Stack and all versions of Beats, especially within Kubernetes
- Experience with Infrastructure as Code, preferably Terraform
- Experience with general automation and config management, preferably Ansible
- Extensive experience building and maintaining Kubernetes clusters and workloads
- Strong foundation of basic network and security concepts
- Ability to build robust CICD pipelines
- Familiarity with relational and non-relational databases
- Solid understanding of Linux operating systems
Qualities & Behaviours
- Exceptional interpersonal and communication skills
- A zest for automation
- Comfortable working as a remote team member and leader
- Ability to keep up to date with DevOps/SRE best practices, trends and innovation
- Passionate about mentoring and growing technical skills within the team
Method of Application
Signup to view application details.
Signup Now