Staff-level Site Reliability Engineer focused on scaling reliability maturity across the organization. This role blends deep operational experience, coaching, and system-level thinking, operating either in platform squads or rotating enablement engagements.
Software / Systems engineers with advanced cloud experience capable of deploying, operating, scaling and maintaining:
- EKS and existing AWS infrastructure
- 3rd party systems deployed within environment
- CICD infrastructure (COTS and internally developed)
What they’ll do
- Advance org-wide reliability practices (SLOs, SLIs, error budgets)
- Lead Tier-0 incident response and operational improvements
- Identify systemic scaling and reliability gaps
- Embed with teams to implement SLOs and uplift reliability
- Improve on-call quality, tooling, and operational hygiene
- Teach reliability as a repeatable engineering discipline