Last updated: 2025-07-21

99 Site Reliability Engineering jobs in San Francisco.

Filters: Categories: Site Reliability Engineering | Locations: San Francisco

Baseten

Join our dynamic team at Baseten, where we’re revolutionizing AI deployment with cutting-edge inference infrastructure. Backed by premier investors such as IVP…

Site Reliability Engineer

San Francisco

  • Skills: Site Reliability Engineer, Kubernetes, Scalable Infrastructure, Infrastructure-as-Code, CI/CD Tools, Project Management, Collaboration, Mentorship, Performance Optimization, Machine Learning
  • Level: mid
  • Type: full_time

Alchemy

Alchemy is the only complete developer platform that offers the powerful APIs, SDKs, and tools necessary to build and scale onchain apps and rollups. Our infra…

Infrastructure Engineer (Reliability Focus)

San Francisco

  • Skills: Reliability, Observability, Infrastructure Engineer, Production Systems, AWS, Docker, Kubernetes, CI/CD, Infrastructure-as-Code, Engineering Excellence
  • Level: mid
  • Type: full_time

Astranis

Astranis is a telecommunications company that operates satellites from geostationary orbit (GEO) to connect millions of people worldwide, currently expanding i…

Senior Site Reliability Engineer - Ground Software

San Francisco

  • Skills: Kubernetes, site reliability engineer, DevOps, Linux, monitoring, deployment practices, software systems, automation, mission control, shell programming
  • Level: senior
  • Type: full_time

Succinct

Succinct is focused on making zero knowledge proofs accessible to developers, with state of the art zkVM technology and a proving network infrastructure.

Senior Software Engineer

San Francisco

  • Skills: zero knowledge proofs, zkVM, distributed system, infrastructure, container orchestration, autoscaling, monitoring stack, observability, Rust, Golang
  • Level: mid
  • Type: full_time

Hive

Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and mos…

DevOps and Systems Engineer

San Francisco

  • Skills: cloud-based AI solutions, machine learning, DevOps, Site Reliability, automation, enterprise SaaS, distributed computing, high performance computing, hybrid infrastructure, GPU integration
  • Level: mid
  • Type: full_time

Neuralink

We are creating devices that enable a bi-directional interface with the brain. These devices allow us to restore movement to the paralyzed, restore sight to th…

Infrastructure Team Member

Fremont

  • Skills: software engineering, cloud architecture, infrastructure, networking protocols, Linux systems, hybrid cloud, security fundamentals, IAC tools, cryptographic protocols, systems administration
  • Level: mid
  • Type: full_time

NewsBreak

NewsBreak is redefining the way users interact with local news and their communities. By bridging local users, local content creators, and local businesses, ou…

Software Engineer in Reliability & Availability

Mountain View

  • Skills: AWS, Kubernetes (EKS), EMR (Elastic MapReduce), service reliability, fault-tolerant architectures, Infrastructure-as-Code (IaC), CI/CD pipelines, monitoring tools (Prometheus, Grafana), high-availability strategies, incident response
  • Level: mid
  • Type: full_time

Luma AI

Site Reliability Engineer (SRE)

Palo Alto

  • Skills: Site Reliability Engineer, SRE, Infrastructure, GPU clusters, H100 GPUs, Monitoring tools, Management tools, Performance problems, Maintenance problems, Data Processing
  • Level: mid
  • Type: full_time

Zoox

Zoox is a robotics company focused on developing autonomous vehicles with an ethos of automation throughout the infrastructure components they build.

Platform/Site Reliability Engineer

Foster City

  • Skills: site reliability engineer, uptime, autonomous vehicles, fault-tolerant systems, deployment, operation, data-processing pipelines, compute-intensive tasks, CPUs, GPUs
  • Level: mid
  • Type: full_time

Replit

Replit is the fastest way to turn ideas into software. With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural languag…

Site Reliability Engineer

Foster City

  • Skills: Site Reliability Engineering, SRE, Infrastructure Automation, Monitoring Solutions, Infrastructure as Code, CI/CD Pipelines, Incident Management, Performance Optimization, Distributed Systems, Cloud-native Technologies
  • Level: mid
  • Type: full_time

Gusto

Gusto is a modern, online people platform that helps small businesses take care of their teams. On top of full-service payroll, Gusto offers health insurance, …

Storage Infrastructure Engineer

San Francisco

  • Skills: storage infrastructure, MySQL, Postgres, data streaming, Kafka, cloud platforms, AWS, Terraform, resiliency, automation
  • Level: mid
  • Type: full_time

Robinhood Markets

Robinhood Markets was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood…

Staff Software Engineer - Reliability

Menlo Park

  • Skills: reliability, scalability, performance, security, software engineering, distributed systems, incident metrics, operational excellence, mentoring, infrastructure
  • Level: senior
  • Type: full_time

Twitter

Twitter is a social media platform that allows users to post and interact with messages known as tweets.

Site Reliability Engineering Team Lead

San Francisco

  • Skills: site reliability engineering, team leadership, engineering collaboration, technical design, reliability practices, coaching, team empowerment, personal development, cross-team communication, system scalability
  • Level: mid
  • Type: full_time

Coupang

Coupang is a leading force in South Korean commerce, known for its exceptional customer service and innovative approach to retail and e-commerce. The company b…

Observability Engineer

Mountain View

  • Skills: observability solutions, monitoring, alerting, logging, tracing, Kubernetes, DevOps, SRE practices, cloud-based infrastructure, performance indicators
  • Level: mid
  • Type: full_time

Palo Alto Networks

Palo Alto Networks is a cybersecurity company that offers advanced firewalls and cloud-based security services to secure the digital transformation.

Manager, Site Reliability Engineering (Cortex, Tools and Platforms)

Santa Clara

  • Skills: DevOps, Site Reliability Engineering, Cortex, Security, Engineering Management, Cloud, Platforms, Production Operations, AI, Software Development
  • Level: mid
  • Type: full_time

Orb

Orb is on a mission to revolutionize billing infrastructure for the modern era of AI and software. We empower businesses to align their monetization with produ…

Infrastructure Engineer

San Francisco

  • Skills: infrastructure, reliability, observability, scalability, performance-critical, event processing, cloud, AWS, resiliency, mentorship
  • Level: mid
  • Type: full_time

Visa

Transform global payment systems through automation and innovation.

Middleware Reliability Engineer

Foster City

  • Skills: automation, infrastructure as code, DevOps, observability, CI/CD, Terraform, Ansible, Python, Java, Go
  • Level: mid
  • Type: full_time

Palo Alto Networks

Palo Alto Networks is a cybersecurity company that aims to redefine protection and security in the digital age. Their mission is to be the cybersecurity partne…

Sr Site Reliability Engineer (App Service Team)

Santa Clara

  • Skills: Site Reliability Engineering, DevOps, cloud-native applications, AWS, GCP, Terraform, Kubernetes, automation, programming languages, CI/CD
  • Level: mid
  • Type: full_time

Celonis

Celonis helps some of the world’s largest and most esteemed brands make processes work for people, companies and the planet. With over 5,000 enterprise custome…

Site Reliability Engineer

Redwood City

  • Skills: Site Reliability Engineering, Microservices, Kubernetes, Automation, Incident management, Cloud computing, Java, Python, Observability, CI/CD
  • Level: mid
  • Type: full_time

Robinhood Markets

Robinhood Markets was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood…

Staff Software Engineer - Reliability Engineering

Menlo Park

  • Skills: reliability, scalability, performance, security, distributed systems, programming languages, Linux, networking, incident metrics, monitoring
  • Level: senior
  • Type: full_time