Staff Site Reliability Engineer (SRE)

Lisboa Remote

Staff-level Site Reliability Engineer focused on scaling reliability maturity across the organization. This role blends deep operational experience, coaching, and system-level thinking, operating either in platform squads or rotating enablement engagements.
 
Software / Systems engineers with advanced cloud experience capable of deploying, operating, scaling and maintaining:

  •  EKS and existing AWS infrastructure
  • 3rd party systems deployed within environment
  • CICD infrastructure (COTS and internally developed)
     
    What they’ll do
  •  Advance org-wide reliability practices (SLOs, SLIs, error budgets)
  • Lead Tier-0 incident response and operational improvements
  • Identify systemic scaling and reliability gaps
  • Embed with teams to implement SLOs and uplift reliability
  • Improve on-call quality, tooling, and operational hygiene
  • Teach reliability as a repeatable engineering discipline
Requirements
  • 10+ years in SRE / production engineering / operations
  • Staff-level experience running large-scale, critical systems
  • Strong hands-on experience with SLO-driven reliability models
  • Proven incident leadership and systems thinking
  • Ability to coach and influence without authority

Nice-to-haves

  • Kubernetes / EKS at scale
  • Enablement or platform team experience
  • Compliance-aware environments