Principal Site Reliability Engineer at Deimos

DeimosLagos, Nigeria Networking and Tech Support

Full Time

Deimos is a Cloud-native Developer and Security Operations technology services company. We help companies of all sizes adopt the Cloud for improved service delivery to their clients. Weï¿½re a fully remote African-based team of engineers who are passionate about implementing engineering best practices. We leverage the latest technologies while building globally competitive solutions for our clients. With Deimos being one of the two moons of Mars, we refer to ourselves as ï¿½Martiansï¿½ who are on a mission to Mars, together.

Role Overview

We are looking for an experienced Principal Site Reliability Engineer to join our Professional Services team and deliver Software and DevSecOps projects. You will report to a Site Reliability Engineering Manager. As a Principal Site Reliability Engineer you will be expected to fill the role of a technical lead on multiple projects simultaneously, representing the senior technical leadership within our organisation
SRE / DevOps is one of our core competencies. You will be part of a highly-skilled team that continuously innovates and delivers high value solutions to clients across various industries on all public clouds (AWS, Azure, GCP, etc). Technologies we work with daily include Kuberenetes, Helm, Terraform, GitOps, OPA, Calico, Linkerd, just to name a few.

What you will be doing

Design and build advanced cloud-native infrastructure
Guide technical discussions with clients and build technical roadmaps
Collaborate with the Engineering Director(s) to (re)design architecture
Assist the Site Reliability Manager with resource planning
Assist engineering managers with building career paths for individuals wishing to be promoted to Principal Engineers
Teach, mentor, grow, and provide advice to other domain experts, individual contributors, and across several teams.
Document processes and monitor performance metrics
Guide conversations to remove blockers and encourage collaboration across teams.
Constantly improve the stability, scalability, security, cost-effectiveness, and operational excellence of our clients' systems.
Continuously discover, evaluate, and implement new technologies to maximize development efficiency and security.
Conduct infrastructure planning, testing, and development
Provide technical leadership on multiple projects.

What you must have

At least 7 or more years experience working in a DevOps/SRE team
Extensive experience in DevOps/SRE, team management and collaboration
Advanced knowledge of best practices related to data encryption and cybersecurity
Advanced knowledge of the general DevOps/SRE landscape, architectures, and emerging technologies
Cloud experience, preferably GCP, Azure and AWS
Experience in Observability Practices and Incident Management
Extensive experience with Prometheus, Grafana, the Elastic Stack and all versions of Beats, especially within Kubernetes
Experience with Infrastructure as Code, preferably Terraform
Experience with general automation and config management, preferably Ansible
Extensive experience building and maintaining Kubernetes clusters and workloads
Strong foundation of basic network and security concepts
Ability to build robust CICD pipelines
Familiarity with relational and non-relational databases
Solid understanding of Linux operating systems

Qualities & Behaviours

Exceptional interpersonal and communication skills
A zest for automation
Comfortable working as a remote team member and leader
Ability to keep up to date with DevOps/SRE best practices, trends and innovation
Passionate about mentoring and growing technical skills within the team