Staff Site Reliability Engineer (SRE)

Lisboa Remote

Staff-level Site Reliability Engineer focused on scaling reliability maturity across the organization. This role blends deep operational experience, coaching, and system-level thinking, operating either in platform squads or rotating enablement engagements.

Software / Systems engineers with advanced cloud experience capable of deploying, operating, scaling and maintaining:

EKS and existing AWS infrastructure
3rd party systems deployed within environment
CICD infrastructure (COTS and internally developed)

What they’ll do
Advance org-wide reliability practices (SLOs, SLIs, error budgets)
Lead Tier-0 incident response and operational improvements
Identify systemic scaling and reliability gaps
Embed with teams to implement SLOs and uplift reliability
Improve on-call quality, tooling, and operational hygiene
Teach reliability as a repeatable engineering discipline

Requirements

10+ years in SRE / production engineering / operations
Staff-level experience running large-scale, critical systems
Strong hands-on experience with SLO-driven reliability models
Proven incident leadership and systems thinking
Ability to coach and influence without authority

Nice-to-haves

Kubernetes / EKS at scale
Enablement or platform team experience
Compliance-aware environments

Apply

by Quickin

Português | English | Español