datatrota
Signup Login
Home Jobs Blog

Site Reliability Engineer/ System Administrator at Engie Africa

Engie AfricaNigeria Networking and Tech Support
Full Time
ENGIE is a leading global energy company that builds its businesses around a model based on responsible growth to take on energy transition challenges. We provide individuals, cities and businesses innovative solutions based on our expertise in 4 key sectors: independent power production, natural gas, renewable energy and energy efficiency services to a low-carbon economy: access to sustainable energy, climate-change mitigation and adaptation and the rational use of resources.

Job Summary:

  • We are seeking a talented and experienced System Administrator/Site Reliability Engineer (SRE) to join our dynamic team.
  • As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance of our systems and services.
  • You will collaborate with cross-functional teams to implement and maintain robust infrastructure solutions, focusing on automation, monitoring, and incident response.
  • The ideal candidate is passionate about optimizing and enhancing system reliability, possesses strong problem-solving skills, and is committed to driving excellence in operational practices.

Key Responsibilities:

Infrastructure Automation:

  • Develop and maintain automation tools and scripts for provisioning, configuration, and deployment.
  • Implement infrastructure as code (IaC) practices to ensure consistency and reproducibility.

Monitoring and Incident Response:

  • Set up and maintain monitoring systems to detect and respond to performance issues and outages.
  • Participate in on-call rotations and respond promptly to incidents, troubleshoot, and implement solutions to prevent recurrence.

Performance Optimization:

  • Optimize system performance through continuous analysis and tuning.

Reliability Engineering:

  • Implement best practices for reliability, such as error budgeting, SLIs/SLOs, and blameless post-mortems.
  • Work towards minimizing manual intervention through automation.

System Administration:

  • Manage and maintain server infrastructure, including installation, configuration, and troubleshooting of operating systems.
  • Implement and maintain security measures, such as firewalls and intrusion detection systems.
  • Perform regular system backups and recovery procedures.

Collaboration and Communication:

  • Collaborate with cross-functional teams to align infrastructure and operational requirements.
  • Provide technical guidance and support to colleagues in areas related to reliability.

Qualifications:

  • Bachelor’s degree in computer science, Information Technology, or a related field.
  • Proven experience as a Site Reliability Engineer or System Administrator.
  • Strong Linux and Bash scripting skills.
  • Proficiency in cloud platforms (e.g., AWS, Azure, GCP, Linode, DigitalOcean).
  • Experience with container orchestration tools (e.g., Kubernetes, Docker, LXD).
  • In-depth knowledge of networking, security, and system administration.
  • Familiarity with infrastructure as code tools (e.g., Terraform, Ansible).
  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and collaboration skills.

Preferred Qualifications:

  • Experience with CI/CD pipelines and related tools.
  • Knowledge of distributed systems and microservices architecture.
  • Familiarity with observability tools (e.g., Prometheus, Grafana, ELK stack).
  • Familiarity with programming languages (e.g., Python, Ruby).

Method of Application

Signup to view application details. Signup Now

More jobs like this

X

Send this job to a friend