Site Reliability Engineer Sr

Detalles de la oferta

We are seeking a Senior Site Reliability Engineer to join a team that works on a complex distributed architecture, spanning physical machines and virtualizing on-prem host/cloud computing.
The role is to help set up centralized DevOps and help existing teams adopt more centralized best practices.
The ideal candidate will have the ability to manage complexity and tackle problems across multiple stack layers as a part of a small team championing operational excellence.

Responsibilities:Architecture and Automation: Design and deploy As-A-Service solutions using open-source software to automate system management, scaling, and monitoring.System Optimization: Develop tools to streamline deployment, monitoring, and incident management for large-scale, distributed environments.Collaboration Across Teams: Work with development and operations teams to design and implement software solutions that enhance the overall reliability of services.
Contribute to the ongoing DevOps and Agile transformation.Monitoring & Incident Response: Set up, configure, and maintain monitoring and alerting systems to ensure real-time visibility into system performance.
Participate in on-call rotations to respond to incidents and mitigate downtime.CI/CD & Infrastructure Management: Continuously improve CI/CD pipelines using tools like GitLab, Helm, Terraform, and Ansible, ensuring fast, safe, and reliable deployments.Container Orchestration: Leverage container orchestration platforms like Kubernetes (K8S) to manage distributed systems at scale.
Experience with Slurm or similar cluster management is a plus.Cloud and Automation Tools: Use cloud infrastructure (AWS, GCP, etc.)
and Infrastructure as Code (IaC) tools to automate the provisioning and scaling of resources. Requirements:Linux Systems: Deep expertise and hands-on experience working with Linux-based systems, with a focus on optimization and troubleshooting.Python Proficiency: Strong skills in Python for scripting, automation, and system management.Containerization & Orchestration: In-depth knowledge of container orchestration technologies such as Kubernetes (K8S).
Experience with other cluster management tools like Slurm is a plus.Infrastructure as Code (IaC): Hands-on experience with tools like Helm, Terraform, and Ansible to manage infrastructure in a scalable and automated way.Container Technologies: Strong working knowledge of Docker, Podman, or other containerization systems to enable efficient and consistent deployment.CI/CD Pipelines: Experience working with CI/CD tools, especially GitLab (preferred), GitHub, or Git, to ensure smooth and rapid delivery cycles.Monitoring & Logging: Experience with monitoring and logging solutions such as Prometheus, Grafana, and the ELK stack to provide comprehensive insights into system performance and health.Relational Databases: Understanding of relational databases, their performance tuning, and management in distributed systems.Agile Development: Familiarity with Agile development methodologies, with a focus on continuous improvement and collaboration.Cloud Experience: Exposure to cloud technologies such as AWS or Google Cloud (GCP) is a strong plus.Collaboration & Communication: A team-first attitude with excellent verbal and written communication skills in English, able to work collaboratively with peers across the organization.
#J-18808-Ljbffr


Salario Nominal: A convenir

Fuente: Jobleads

Requisitos

Ingeniero Fullstack Senior

Buscamos Ingenieros dispuestos a innovar y asumir los siguientes retos: Experiencia de más de 5 años desarrollando aplicaciones web con Javascript, React y N...


Gesthion Organizacional S.A.S. - Bogotá D. C.

Publicado a month ago

Help Desk Analyst L1

Job description ¡Sé parte de Stefanini!? En Stefanini somos más de 30.000 genios, conectados desde 41 países, haciendo lo que les apasiona y co-creando un fu...


Stefanini Latam - Bogotá D. C.

Publicado a month ago

Analista De Datos

Somos la agencia tecnológicamente más evolucionada de Latinoamérica, que durante más de 25 años hemos desarrollado expertis en Investigación de Mercados, per...


Scotiabankcolpatria - Bogotá D. C.

Publicado a month ago

Administrador Sistemas Unix Aix

Administrador Sistemas Unix Aix Compañía líder en transformación digital requiere profesional en Ingeniería de Telecomunicación, Informática, Ingeniería de S...


Farma De Colombia - Bogotá D. C.

Publicado a month ago

Built at: 2025-01-13T18:17:18.552Z