ENGIE is a leading global energy company that builds its businesses around a model based on responsible growth to take on energy transition challenges. We provide individuals, cities and businesses innovative solutions based on our expertise in 4 key sectors: independent power production, natural gas, renewable energy and energy efficiency services to a low-carbon economy: access to sustainable energy, climate-change mitigation and adaptation and the rational use of resources.
Job Summary:
- We are seeking a talented and experienced System Administrator/Site Reliability Engineer (SRE) to join our dynamic team.
- As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance of our systems and services.
- You will collaborate with cross-functional teams to implement and maintain robust infrastructure solutions, focusing on automation, monitoring, and incident response.
- The ideal candidate is passionate about optimizing and enhancing system reliability, possesses strong problem-solving skills, and is committed to driving excellence in operational practices.
Key Responsibilities:
Infrastructure Automation:
- Develop and maintain automation tools and scripts for provisioning, configuration, and deployment.
- Implement infrastructure as code (IaC) practices to ensure consistency and reproducibility.
Monitoring and Incident Response:
- Set up and maintain monitoring systems to detect and respond to performance issues and outages.
- Participate in on-call rotations and respond promptly to incidents, troubleshoot, and implement solutions to prevent recurrence.
Performance Optimization:
- Optimize system performance through continuous analysis and tuning.
Reliability Engineering:
- Implement best practices for reliability, such as error budgeting, SLIs/SLOs, and blameless post-mortems.
- Work towards minimizing manual intervention through automation.
System Administration:
- Manage and maintain server infrastructure, including installation, configuration, and troubleshooting of operating systems.
- Implement and maintain security measures, such as firewalls and intrusion detection systems.
- Perform regular system backups and recovery procedures.
Collaboration and Communication:
- Collaborate with cross-functional teams to align infrastructure and operational requirements.
- Provide technical guidance and support to colleagues in areas related to reliability.
Qualifications:
- Bachelor’s degree in computer science, Information Technology, or a related field.
- Proven experience as a Site Reliability Engineer or System Administrator.
- Strong Linux and Bash scripting skills.
- Proficiency in cloud platforms (e.g., AWS, Azure, GCP, Linode, DigitalOcean).
- Experience with container orchestration tools (e.g., Kubernetes, Docker, LXD).
- In-depth knowledge of networking, security, and system administration.
- Familiarity with infrastructure as code tools (e.g., Terraform, Ansible).
- Excellent problem-solving and troubleshooting skills.
- Strong communication and collaboration skills.
Preferred Qualifications:
- Experience with CI/CD pipelines and related tools.
- Knowledge of distributed systems and microservices architecture.
- Familiarity with observability tools (e.g., Prometheus, Grafana, ELK stack).
- Familiarity with programming languages (e.g., Python, Ruby).
Method of Application
Signup to view application details.
Signup Now