Last updated: 2025-07-21
99 Site Reliability Engineering jobs in San Francisco.
Baseten
Join our dynamic team at Baseten, where we’re revolutionizing AI deployment with cutting-edge inference infrastructure. Backed by premier investors such as IVP…
San Francisco
- Skills: Site Reliability Engineer, Kubernetes, Scalable Infrastructure, Infrastructure-as-Code, CI/CD Tools, Project Management, Collaboration, Mentorship, Performance Optimization, Machine Learning
- Level: mid
- Type: full_time
Alchemy
Alchemy is the only complete developer platform that offers the powerful APIs, SDKs, and tools necessary to build and scale onchain apps and rollups. Our infra…
San Francisco
- Skills: Reliability, Observability, Infrastructure Engineer, Production Systems, AWS, Docker, Kubernetes, CI/CD, Infrastructure-as-Code, Engineering Excellence
- Level: mid
- Type: full_time
Astranis
Astranis is a telecommunications company that operates satellites from geostationary orbit (GEO) to connect millions of people worldwide, currently expanding i…
San Francisco
- Skills: Kubernetes, site reliability engineer, DevOps, Linux, monitoring, deployment practices, software systems, automation, mission control, shell programming
- Level: senior
- Type: full_time
Succinct
Succinct is focused on making zero knowledge proofs accessible to developers, with state of the art zkVM technology and a proving network infrastructure.
San Francisco
- Skills: zero knowledge proofs, zkVM, distributed system, infrastructure, container orchestration, autoscaling, monitoring stack, observability, Rust, Golang
- Level: mid
- Type: full_time
Hive
Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and mos…
San Francisco
- Skills: cloud-based AI solutions, machine learning, DevOps, Site Reliability, automation, enterprise SaaS, distributed computing, high performance computing, hybrid infrastructure, GPU integration
- Level: mid
- Type: full_time
Neuralink
We are creating devices that enable a bi-directional interface with the brain. These devices allow us to restore movement to the paralyzed, restore sight to th…
Fremont
- Skills: software engineering, cloud architecture, infrastructure, networking protocols, Linux systems, hybrid cloud, security fundamentals, IAC tools, cryptographic protocols, systems administration
- Level: mid
- Type: full_time
NewsBreak
NewsBreak is redefining the way users interact with local news and their communities. By bridging local users, local content creators, and local businesses, ou…
Mountain View
- Skills: AWS, Kubernetes (EKS), EMR (Elastic MapReduce), service reliability, fault-tolerant architectures, Infrastructure-as-Code (IaC), CI/CD pipelines, monitoring tools (Prometheus, Grafana), high-availability strategies, incident response
- Level: mid
- Type: full_time
Luma AI
Palo Alto
- Skills: Site Reliability Engineer, SRE, Infrastructure, GPU clusters, H100 GPUs, Monitoring tools, Management tools, Performance problems, Maintenance problems, Data Processing
- Level: mid
- Type: full_time
Zoox
Zoox is a robotics company focused on developing autonomous vehicles with an ethos of automation throughout the infrastructure components they build.
Foster City
- Skills: site reliability engineer, uptime, autonomous vehicles, fault-tolerant systems, deployment, operation, data-processing pipelines, compute-intensive tasks, CPUs, GPUs
- Level: mid
- Type: full_time
Replit
Replit is the fastest way to turn ideas into software. With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural languag…
Foster City
- Skills: Site Reliability Engineering, SRE, Infrastructure Automation, Monitoring Solutions, Infrastructure as Code, CI/CD Pipelines, Incident Management, Performance Optimization, Distributed Systems, Cloud-native Technologies
- Level: mid
- Type: full_time
Gusto
Gusto is a modern, online people platform that helps small businesses take care of their teams. On top of full-service payroll, Gusto offers health insurance, …
San Francisco
- Skills: storage infrastructure, MySQL, Postgres, data streaming, Kafka, cloud platforms, AWS, Terraform, resiliency, automation
- Level: mid
- Type: full_time
Robinhood Markets
Robinhood Markets was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood…
Menlo Park
- Skills: reliability, scalability, performance, security, software engineering, distributed systems, incident metrics, operational excellence, mentoring, infrastructure
- Level: senior
- Type: full_time
Twitter is a social media platform that allows users to post and interact with messages known as tweets.
San Francisco
- Skills: site reliability engineering, team leadership, engineering collaboration, technical design, reliability practices, coaching, team empowerment, personal development, cross-team communication, system scalability
- Level: mid
- Type: full_time
Coupang
Coupang is a leading force in South Korean commerce, known for its exceptional customer service and innovative approach to retail and e-commerce. The company b…
Mountain View
- Skills: observability solutions, monitoring, alerting, logging, tracing, Kubernetes, DevOps, SRE practices, cloud-based infrastructure, performance indicators
- Level: mid
- Type: full_time
Palo Alto Networks
Palo Alto Networks is a cybersecurity company that offers advanced firewalls and cloud-based security services to secure the digital transformation.
Santa Clara
- Skills: DevOps, Site Reliability Engineering, Cortex, Security, Engineering Management, Cloud, Platforms, Production Operations, AI, Software Development
- Level: mid
- Type: full_time
Orb
Orb is on a mission to revolutionize billing infrastructure for the modern era of AI and software. We empower businesses to align their monetization with produ…
San Francisco
- Skills: infrastructure, reliability, observability, scalability, performance-critical, event processing, cloud, AWS, resiliency, mentorship
- Level: mid
- Type: full_time
Visa
Transform global payment systems through automation and innovation.
Foster City
- Skills: automation, infrastructure as code, DevOps, observability, CI/CD, Terraform, Ansible, Python, Java, Go
- Level: mid
- Type: full_time
Palo Alto Networks
Palo Alto Networks is a cybersecurity company that aims to redefine protection and security in the digital age. Their mission is to be the cybersecurity partne…
Santa Clara
- Skills: Site Reliability Engineering, DevOps, cloud-native applications, AWS, GCP, Terraform, Kubernetes, automation, programming languages, CI/CD
- Level: mid
- Type: full_time
Celonis
Celonis helps some of the world’s largest and most esteemed brands make processes work for people, companies and the planet. With over 5,000 enterprise custome…
Redwood City
- Skills: Site Reliability Engineering, Microservices, Kubernetes, Automation, Incident management, Cloud computing, Java, Python, Observability, CI/CD
- Level: mid
- Type: full_time
Robinhood Markets
Robinhood Markets was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood…
Menlo Park
- Skills: reliability, scalability, performance, security, distributed systems, programming languages, Linux, networking, incident metrics, monitoring
- Level: senior
- Type: full_time