163 Site Reliability Engineering jobs in San Francisco

Baseten

Join our dynamic team at Baseten, where we’re revolutionizing AI deployment with cutting-edge inference infrastructure. Backed by premier investors such as IVP…

Site Reliability Engineer

San Francisco

Skills: Site Reliability Engineer, Kubernetes, Scalable Infrastructure, Infrastructure-as-Code, CI/CD Tools, Project Management, Collaboration, Mentorship, Performance Optimization, Machine Learning
Level: mid
Type: full_time

View details

Hive

Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and mos…

DevOps and Systems Engineer

San Francisco

Skills: cloud-based AI solutions, machine learning, DevOps, Site Reliability, automation, enterprise SaaS, distributed computing, high performance computing, hybrid infrastructure, GPU integration
Level: mid
Type: full_time

View details

NewsBreak

NewsBreak is redefining the way users interact with local news and their communities. By bridging local users, local content creators, and local businesses, ou…

Software Engineer in Reliability & Availability

Mountain View

Skills: AWS, Kubernetes (EKS), EMR (Elastic MapReduce), service reliability, fault-tolerant architectures, Infrastructure-as-Code (IaC), CI/CD pipelines, monitoring tools (Prometheus, Grafana), high-availability strategies, incident response
Level: mid
Type: full_time

View details

Luma AI

Site Reliability Engineer (SRE)

Palo Alto

Skills: Site Reliability Engineer, SRE, Infrastructure, GPU clusters, H100 GPUs, Monitoring tools, Management tools, Performance problems, Maintenance problems, Data Processing
Level: mid
Type: full_time

View details

Site Reliability Engineer (SRE)

Replit

Replit is the fastest way to turn ideas into software. With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural languag…

Site Reliability Engineer

Foster City

Skills: Site Reliability Engineering, SRE, Infrastructure Automation, Monitoring Solutions, Infrastructure as Code, CI/CD Pipelines, Incident Management, Performance Optimization, Distributed Systems, Cloud-native Technologies
Level: mid
Type: full_time

View details

Twitter

Twitter is a social media platform that allows users to post and interact with messages known as tweets.

Site Reliability Engineering Team Lead

San Francisco

Skills: site reliability engineering, team leadership, engineering collaboration, technical design, reliability practices, coaching, team empowerment, personal development, cross-team communication, system scalability
Level: mid
Type: full_time

View details

Coupang

Coupang is a leading force in South Korean commerce, known for its exceptional customer service and innovative approach to retail and e-commerce. The company b…

Observability Engineer

Mountain View

Skills: observability solutions, monitoring, alerting, logging, tracing, Kubernetes, DevOps, SRE practices, cloud-based infrastructure, performance indicators
Level: mid
Type: full_time

View details

Palo Alto Networks

Palo Alto Networks is a cybersecurity company that offers advanced firewalls and cloud-based security services to secure the digital transformation.

Manager, Site Reliability Engineering (Cortex, Tools and Platforms)

Santa Clara

Skills: DevOps, Site Reliability Engineering, Cortex, Security, Engineering Management, Cloud, Platforms, Production Operations, AI, Software Development
Level: mid
Type: full_time

View details

Orb

Orb is on a mission to revolutionize billing infrastructure for the modern era of AI and software. We empower businesses to align their monetization with produ…

Infrastructure Engineer

San Francisco

Skills: infrastructure, reliability, observability, scalability, performance-critical, event processing, cloud, AWS, resiliency, mentorship
Level: mid
Type: full_time

View details

Celonis

Celonis helps some of the world’s largest and most esteemed brands make processes work for people, companies, and the planet. With over 5,000 enterprise custom…

Site Reliability Engineer

Redwood City

Skills: Site Reliability Engineering, SRE principles, observability, automation, incident prevention, cloud platforms, Java, Python, Kubernetes, error budgets
Level: senior
Type: full_time

View details

Loft Orbital

Loft Orbital is revolutionizing access to space by building reliable, shareable satellites that drastically reduce the time and complexity traditionally requir…

Senior Site Reliability Engineer

San Francisco

Skills: Site Reliability Engineering, Cloud Infrastructure, DevOps, satellites, space operations, integration, delivery, reliability, automated infrastructure, SatDevOps
Level: mid
Type: full_time

View details

Crusoe Energy Systems

Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated, purpose-built AI infrastructure solution…

Staff Site Reliability Engineer

San Francisco

Skills: Site Reliability Engineering, AI infrastructure, automation, monitoring, incident response, system performance, network programming, security best practices, CI/CD, cloud infrastructure
Level: senior
Type: full_time

View details

Crusoe

Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated, purpose-built AI infrastructure solution…

Senior Site Reliability Engineer

San Francisco

Skills: Site Reliability Engineering, AI infrastructure, production systems, system reliability, automation, monitoring, Unix/Linux, Cloud, Kubernetes, CI/CD
Level: mid
Type: full_time

View details

Speak

Speak is on a journey to fix the language learning experience by creating AI-powered conversational tools to help billions gain fluency.

SRE Engineer, Lead

San Francisco

Skills: reliability, infrastructure, Kubernetes, GCP, Node.js, PostgreSQL, Redis, observability, incident response, scalability
Level: senior
Type: full_time

View details

Rubrik

Rubrik (NYSE: RBRK) is on a mission to secure the world’s data. With Zero Trust Data Security™, we help organizations achieve business resilience against cyber…

Site Reliability Engineer

Palo Alto

Skills: Site Reliability Engineering, Relational Databases, SQL, Kubernetes, Golang, Python, Java, Scalability, Disaster Recovery, FedRAMP
Level: mid
Type: full_time

View details

Genmo

We are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of AGI.

Site Reliability Engineer

San Francisco

Skills: GPU clusters, Kubernetes operations, Infrastructure-as-Code, GitOps workflows, CI/CD pipelines, observability stack, high-performance networking, NVIDIA DCGM, containerized GPU stacks, distributed training
Level: mid
Type: full_time

View details

Figma

Figma helps entire product teams brainstorm, design and build better products — from start to finish. Whether it’s consolidating tools, simplifying workflows, …

Production Engineer

San Francisco

Skills: Infrastructure, AWS, Distributed Systems, Telemetry, Scale, Reliability, Durability, Performance, Engineering, Software Development
Level: mid
Type: full_time

View details

Anomali

Anomali is headquartered in Silicon Valley and is the Leading AI-Powered Security Operations Platform that is modernizing security operations. At the center of…

Senior DevOps Engineer/SRE

Redwood City

Skills: Kubernetes, Terraform, CI/CD, AWS, New Relic, Python, Golang, EKS, Automation, Infrastructure as Code
Level: mid
Type: full_time

View details

Column Tax

We’re building the next generation of tax software. Our mission is to make it possible for every taxpayer to file confidently in just one click. As the fastest…

Software Engineer (Infrastructure)

San Francisco

Skills: Infrastructure, AWS, GCP, Terraform, CI/CD, Observability, Cloud operations, On-call, System performance, DevOps
Level: mid
Type: full_time

View details

Crusoe Energy Systems

Crusoe is building the World’s Favorite AI-first Cloud infrastructure company, pioneering vertically integrated, purpose-built AI infrastructure solutions powe…

Site Reliability Engineer - Storage

San Francisco

Skills: Site Reliability Engineering, cloud storage, automation, distributed systems, Infrastructure as Code, Linux internals, Kubernetes, object storage, performance tuning, data replication
Level: mid
Type: full_time

View details

163 Site Reliability Engineering jobs in San Francisco.

Baseten

Hive

NewsBreak

Luma AI

Replit

Twitter

Coupang

Palo Alto Networks

Orb

Celonis

Loft Orbital

Crusoe Energy Systems

Crusoe

Speak

Rubrik

Genmo

Figma

Anomali

Column Tax

Crusoe Energy Systems