Last updated: 2025-07-21
56 Site Reliability Engineering jobs in San Jose.
Neuralink
We are creating devices that enable a bi-directional interface with the brain. These devices allow us to restore movement to the paralyzed, restore sight to th…
Fremont
- Skills: software engineering, cloud architecture, infrastructure, networking protocols, Linux systems, hybrid cloud, security fundamentals, IAC tools, cryptographic protocols, systems administration
- Level: mid
- Type: full_time
NewsBreak
NewsBreak is redefining the way users interact with local news and their communities. By bridging local users, local content creators, and local businesses, ou…
Mountain View
- Skills: AWS, Kubernetes (EKS), EMR (Elastic MapReduce), service reliability, fault-tolerant architectures, Infrastructure-as-Code (IaC), CI/CD pipelines, monitoring tools (Prometheus, Grafana), high-availability strategies, incident response
- Level: mid
- Type: full_time
Luma AI
Palo Alto
- Skills: Site Reliability Engineer, SRE, Infrastructure, GPU clusters, H100 GPUs, Monitoring tools, Management tools, Performance problems, Maintenance problems, Data Processing
- Level: mid
- Type: full_time
Zoox
Zoox is a robotics company focused on developing autonomous vehicles with an ethos of automation throughout the infrastructure components they build.
Foster City
- Skills: site reliability engineer, uptime, autonomous vehicles, fault-tolerant systems, deployment, operation, data-processing pipelines, compute-intensive tasks, CPUs, GPUs
- Level: mid
- Type: full_time
Replit
Replit is the fastest way to turn ideas into software. With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural languag…
Foster City
- Skills: Site Reliability Engineering, SRE, Infrastructure Automation, Monitoring Solutions, Infrastructure as Code, CI/CD Pipelines, Incident Management, Performance Optimization, Distributed Systems, Cloud-native Technologies
- Level: mid
- Type: full_time
Robinhood Markets
Robinhood Markets was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood…
Menlo Park
- Skills: reliability, scalability, performance, security, software engineering, distributed systems, incident metrics, operational excellence, mentoring, infrastructure
- Level: senior
- Type: full_time
Coupang
Coupang is a leading force in South Korean commerce, known for its exceptional customer service and innovative approach to retail and e-commerce. The company b…
Mountain View
- Skills: observability solutions, monitoring, alerting, logging, tracing, Kubernetes, DevOps, SRE practices, cloud-based infrastructure, performance indicators
- Level: mid
- Type: full_time
Palo Alto Networks
Palo Alto Networks is a cybersecurity company that offers advanced firewalls and cloud-based security services to secure the digital transformation.
Santa Clara
- Skills: DevOps, Site Reliability Engineering, Cortex, Security, Engineering Management, Cloud, Platforms, Production Operations, AI, Software Development
- Level: mid
- Type: full_time
NetApp
NetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or …
San Jose
- Skills: Cloud, Software Engineering, SRE, Incident Management, Observability, Application Security, Python, Golang, DevSecOps, Virtualization
- Level: mid
- Type: full_time
Visa
Transform global payment systems through automation and innovation.
Foster City
- Skills: automation, infrastructure as code, DevOps, observability, CI/CD, Terraform, Ansible, Python, Java, Go
- Level: mid
- Type: full_time
Palo Alto Networks
Palo Alto Networks is a cybersecurity company that aims to redefine protection and security in the digital age. Their mission is to be the cybersecurity partne…
Santa Clara
- Skills: Site Reliability Engineering, DevOps, cloud-native applications, AWS, GCP, Terraform, Kubernetes, automation, programming languages, CI/CD
- Level: mid
- Type: full_time
Celonis
Celonis helps some of the world’s largest and most esteemed brands make processes work for people, companies and the planet. With over 5,000 enterprise custome…
Redwood City
- Skills: Site Reliability Engineering, Microservices, Kubernetes, Automation, Incident management, Cloud computing, Java, Python, Observability, CI/CD
- Level: mid
- Type: full_time
Robinhood Markets
Robinhood Markets was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood…
Menlo Park
- Skills: reliability, scalability, performance, security, distributed systems, programming languages, Linux, networking, incident metrics, monitoring
- Level: senior
- Type: full_time
Coupang
A fastest-growing retail company, disrupting the commerce industry from South Korea, combining startup culture with large global resources.
Mountain View
- Skills: site reliability engineering, performance, distributed systems, large-scale systems, project management, security, privacy, compliance, stakeholders, scalability
- Level: senior
- Type: full_time
Zoox
Zoox is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the…
Foster City
- Skills: Site Reliability Engineering, Autonomous Vehicles, Microservice Architecture, Kubernetes, Data Pipelines, Performance Metrics, Linux, Python, C/C++, AWS
- Level: mid
- Type: full_time
Reliable Robotics
Building safety-enhancing technology for aviation to improve safety, convenience, and transformation of air transportation.
Mountain View
- Skills: support, monitoring, infrastructure, tools, deploying, systems safety, technology, automation, supporting, improvement
- Level: mid
- Type: full_time
Rubrik
Rubrik (NYSE: RBRK) is on a mission to secure the world’s data. With Zero Trust Data Security™, we help organizations achieve business resilience against cyber…
Palo Alto
- Skills: Site Reliability Engineering, Relational Databases, SQL, Kubernetes, Golang, Python, Java, Scalability, Disaster Recovery, FedRAMP
- Level: mid
- Type: full_time
Wayve
Founded in 2017, Wayve is the leading developer of Embodied AI technology. Our advanced AI software and foundation models enable vehicles to perceive, understa…
Sunnyvale
- Skills: Site Reliability Engineering, Python, C++, Rust, Cloud Computing, CI/CD, Containerization, Monitoring, Troubleshooting, Autonomous Vehicles
- Level: senior
- Type: full_time
Productiv
Productiv started with a vision to transform the way enterprises manage and optimize their software portfolios. With a focus on driving efficiency and transpar…
Palo Alto
- Skills: AWS, Terraform, Infrastructure as Code, Cloud-native, Observability, DevOps, Monitoring, CI/CD, Automation, Scalability
- Level: mid
- Type: full_time
Google is a global technology company known for its search engine, online advertising technologies, cloud computing, software, and hardware.
Sunnyvale
- Skills: Site Reliability Development, distributed systems, automation, software development, system design, operational health, capacity planning, incident response, technical leadership, programming languages
- Level: mid
- Type: full_time