datatrota
Signup Login
Home Jobs Blog

Site Reliability Engineer Jobs in Nigeria

View Site Reliability Engineer jobs on TechTalentZone
  • Ascentech Services Limited logo

    Head, Site Reliability Engineer

    Ascentech Services Li..Lagos, Nigeria26 August

    Ascentech Services Ltd acts as a gateway to provide a wide range of recruitment and selection services to companies. We are a dedicated team of professional ...

    Onsite
  • Renmoney logo

    Site Reliability Engineer

    RenmoneyLagos, Nigeria07 August

    At Renmoney, we believe finance should be simple, useful and accessible to everyone. That’s what makes us really passionate about leveraging data driven ...

    Hybrid
  • Renmoney logo

    Site Reliability Engineer

    RenmoneyLagos, Nigeria20 June

    At Renmoney, we believe finance should be simple, useful and accessible to everyone. That’s what makes us really passionate about leveraging data driven ...

    Onsite
  • Termii logo

    Site Reliability Engineer

    TermiiLagos, Nigeria06 June

    Termii is a communications platform that allows African businesses to send messages to anyone across SMS, email, voice, and instant messaging channels. With ...

    Onsite
  • Tezza Business Solutions Ltd logo

    Site Reliability Engineer (SRE)

    Tezza Business Soluti..Lagos, Nigeria01 February

    Tezza”(te-zza) from the Italian word "Completezza” embodies our commitment to providing IT and Business Solutions that are comprehensive, through ...

    Onsite
  • Engie Africa logo

    Site Reliability Engineer/ System Administrator

    Engie AfricaNigeria11 December, 2023

    ENGIE is a leading global energy company that builds its businesses around a model based on responsible growth to take on energy transition challenges. We ...

    Onsite

Who is a site reliability engineer?

A site reliability engineer is a unique role that requires either a background as a sysadmin, a software developer with additional operations experience, or someone in an IT operations role that also has software development skills. SRE teams are responsible for how code is deployed, configured, and monitored, as well as the availability, latency, change management, emergency response, and capacity management of services in production. SRE teams determine the launch of new features by using service-level agreements (SLAs) to define the required reliability of the system through service-level indicators (SLI) and service-level objectives (SLO).

Roles and Responsibilities of a site reliability engineer

Building software to help DevOps, ITOps & support teams

SRE teams are in charge of proactively building and implementing services to make IT and support better at their jobs. This can be anything from adjustments to monitoring and alerting to code changes in production. A site reliability engineer can be tasked with building a homegrown tool from scratch to help with weaknesses in software delivery or incident management.

Fixing support escalation issues

Similar to the point above, a site reliability engineer can expect to spend time fixing support escalation cases. But, as your SRE operations mature, your systems will become more reliable and you’ll see fewer critical incidents in production – leading to fewer support escalations.

Because an SRE team touches so many different parts of the engineering and IT organization, they can be a great source of knowledge and can be helpful for routing issues to the right people and teams.

Optimizing on-call rotations & processes

More times than not, site reliability engineers will need to take on-call responsibilities. At most organizations, the SRE role will have a lot of say in how the team can improve system reliability through the optimization of on-call processes.

SRE teams will help add automation and context to alerts – leading to better real-time collaborative response from on-call responders. Additionally, site reliability engineers can update runbooks, tools and documentation to help prepare on-call teams for future incidents.

Documenting “tribal” knowledge

SRE teams gain exposure to systems in both staging and production, as well as all technical teams. They take part in work with software development, support, IT operations and on-call duties – meaning they build up a great amount of historical knowledge over time. Instead of siloing this knowledge into the mind of one team or one person, site reliability engineers can be tasked with documenting much of what they know. Constant upkeep of documentation and runbooks can ensure that teams get the information they need right when they need it.

Conducting post-incident reviews

Without thorough post-incident reviews, you have no way to identify what’s working and what’s not. SRE teams need to keep teams honest and ensure that everyone — software developers and IT professionals — are conducting post-incident reviews, documenting their findings and taking action on their learnings.

Then, site reliability engineers are often tasked with action items for building or optimizing some part of the SDLC or incident lifecycle to bolster the reliability of their service.

Skills for a site reliability engineer

  1. Coding languages: As an SRE, you will need to be proficient in at least one coding language. This is because you will often be required to write code in order to automate tasks or build tools. The most popular coding languages among SREs are Python, Java, and Go.

  2. CI/CD pipeline development: In order to release code changes safely and efficiently, you will need to be well-versed in continuous integration (CI) and continuous delivery (CD) pipelines.

  3. Mastered distributed computing: Many companies today use distributed systems in order to achieve high availability and scalability. As an SRE, you will need to have a deep understanding of how distributed systems work in order to be able to troubleshoot and optimize them.

  4. Using Monitoring tools: Monitoring is essential for keeping track of the health of company services and products. As an SRE, you should be familiar with various monitoring tools such as Prometheus, Solarwinds, Pingdom, Zabbix, and Zoho.

  5. Using version control tools: Version control tools such as Git are used by developers to share and manage code changes. As an SRE, you will need to be familiar with these tools in order to help developers with code deployments.

  6. Understanding operating systems: To effectively manage company services, you will need to have a deep understanding of various operating systems such as Linux, Windows, and macOS.

  7. Deep understanding of databases: Databases are often used by company services in order to store data. As an SRE, you should have a deep understanding of how different types of databases work in order to be able to effectively troubleshoot any issues that may arise.

  8. Automation skills: Automation is crucial for reducing the amount of manual work that needs to be done in order to maintain company services. As an SRE, you should be proficient in various automation tools such as ACCELQ and Avo Assure.

  9. Knowing cloud-native applications: Cloud-native applications are designed specifically for deployment on cloud platforms such as AWS and Azure. As an SRE, you should have experience working with cloud-native applications to manage them effectively.