Last updated: 2025-08-19
56 Site Reliability Engineering jobs in San Jose.
Neuralink
We are creating devices that enable a bi-directional interface with the brain. These devices allow us to restore movement to the paralyzed, restore sight to th…
Fremont
- Skills: software engineering, cloud architecture, infrastructure, networking protocols, Linux systems, hybrid cloud, security fundamentals, IAC tools, cryptographic protocols, systems administration
- Level: mid
- Type: full_time
NewsBreak
NewsBreak is redefining the way users interact with local news and their communities. By bridging local users, local content creators, and local businesses, ou…
Mountain View
- Skills: AWS, Kubernetes (EKS), EMR (Elastic MapReduce), service reliability, fault-tolerant architectures, Infrastructure-as-Code (IaC), CI/CD pipelines, monitoring tools (Prometheus, Grafana), high-availability strategies, incident response
- Level: mid
- Type: full_time
Luma AI
Palo Alto
- Skills: Site Reliability Engineer, SRE, Infrastructure, GPU clusters, H100 GPUs, Monitoring tools, Management tools, Performance problems, Maintenance problems, Data Processing
- Level: mid
- Type: full_time
Zoox
Zoox is a robotics company focused on developing autonomous vehicles with an ethos of automation throughout the infrastructure components they build.
Foster City
- Skills: site reliability engineer, uptime, autonomous vehicles, fault-tolerant systems, deployment, operation, data-processing pipelines, compute-intensive tasks, CPUs, GPUs
- Level: mid
- Type: full_time
Replit
Replit is the fastest way to turn ideas into software. With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural languag…
Foster City
- Skills: Site Reliability Engineering, SRE, Infrastructure Automation, Monitoring Solutions, Infrastructure as Code, CI/CD Pipelines, Incident Management, Performance Optimization, Distributed Systems, Cloud-native Technologies
- Level: mid
- Type: full_time
Coupang
Coupang is a leading force in South Korean commerce, known for its exceptional customer service and innovative approach to retail and e-commerce. The company b…
Mountain View
- Skills: observability solutions, monitoring, alerting, logging, tracing, Kubernetes, DevOps, SRE practices, cloud-based infrastructure, performance indicators
- Level: mid
- Type: full_time
Palo Alto Networks
Palo Alto Networks is a cybersecurity company that offers advanced firewalls and cloud-based security services to secure the digital transformation.
Santa Clara
- Skills: DevOps, Site Reliability Engineering, Cortex, Security, Engineering Management, Cloud, Platforms, Production Operations, AI, Software Development
- Level: mid
- Type: full_time
NetApp
NetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or …
San Jose
- Skills: Cloud, Software Engineering, SRE, Incident Management, Observability, Application Security, Python, Golang, DevSecOps, Virtualization
- Level: mid
- Type: full_time
Box
Box (NYSE:BOX) is the leader in Intelligent Content Management. Our platform enables organizations to fuel collaboration, manage the entire content lifecycle, …
Redwood City
- Skills: SRE, reliability, scalability, cloud-native, Kubernetes, AWS, GCP, observability, automation, distributed systems
- Level: mid
- Type: full_time
Celonis
Celonis helps some of the world’s largest and most esteemed brands make processes work for people, companies, and the planet. With over 5,000 enterprise custom…
Redwood City
- Skills: Site Reliability Engineering, SRE principles, observability, automation, incident prevention, cloud platforms, Java, Python, Kubernetes, error budgets
- Level: senior
- Type: full_time
Reliable Robotics
Building safety-enhancing technology for aviation to improve safety, convenience, and transformation of air transportation.
Mountain View
- Skills: support, monitoring, infrastructure, tools, deploying, systems safety, technology, automation, supporting, improvement
- Level: mid
- Type: full_time
Rubrik
Rubrik (NYSE: RBRK) is on a mission to secure the world’s data. With Zero Trust Data Security™, we help organizations achieve business resilience against cyber…
Palo Alto
- Skills: Site Reliability Engineering, Relational Databases, SQL, Kubernetes, Golang, Python, Java, Scalability, Disaster Recovery, FedRAMP
- Level: mid
- Type: full_time
Wayve
Founded in 2017, Wayve is the leading developer of Embodied AI technology. Our advanced AI software and foundation models enable vehicles to perceive, understa…
Sunnyvale
- Skills: Site Reliability Engineering, Python, C++, Rust, Cloud Computing, CI/CD, Containerization, Monitoring, Troubleshooting, Autonomous Vehicles
- Level: senior
- Type: full_time
Anomali
Anomali is headquartered in Silicon Valley and is the Leading AI-Powered Security Operations Platform that is modernizing security operations. At the center of…
Redwood City
- Skills: Kubernetes, Terraform, CI/CD, AWS, New Relic, Python, Golang, EKS, Automation, Infrastructure as Code
- Level: mid
- Type: full_time
PayNearMe
PayNearMe develops technology to facilitate the end-to-end customer payment experience, making it easy for businesses to accept, disburse and manage payments. …
Santa Clara
- Skills: Site Reliability Engineering, Infrastructure Management, Kubernetes, Terraform, Monitoring, Automation, CI/CD, Cloud Platforms, Scripting, Observability
- Level: mid
- Type: full_time
Intuit Credit Karma
Intuit Credit Karma is a mission-driven company, focused on championing financial progress for our more than 140 million members globally. While we're best kno…
Oakland
- Skills: MySQL, GCP, automation, spanner, Terraform, reliability, cloud infrastructure, SRE methodologies, database architecture, data recovery
- Level: mid
- Type: full_time
Palo Alto Networks
Palo Alto Networks is dedicated to revolutionizing cybersecurity, ensuring the digital way of life is protected across various environments. The company is bui…
Santa Clara
- Skills: Infrastructure, SRE, DevOps, AWS, GCP, Terraform, Kubernetes, CI/CD, Python, Automation
- Level: senior
- Type: full_time
Glean
Glean is an innovative AI-powered knowledge management platform designed to help organizations quickly find, organize, and share information across their teams.
Palo Alto
- Skills: Site Reliability Engineering, Cloud Infrastructure, Automation, Monitoring, Docker, Kubernetes, Google Cloud Platform, AWS, Terraform, Performance Optimization
- Level: senior
- Type: full_time
Visa
Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial instituti…
Foster City
- Skills: Production Support, Incident Management, Observability Tools, Grafana, Splunk, Docker, Kubernetes, AI Techniques, System Reliability, Automation
- Level: mid
- Type: full_time
Palo Alto Networks
Palo Alto Networks is committed to being the cybersecurity partner of choice, protecting our digital way of life and innovating in the field of cybersecurity.
Santa Clara
- Skills: Kubernetes, Ansible, Terraform, Cloud infrastructure, Monitoring and alerting, DevOps, Site Reliability Engineering, GCP, AWS, Automation
- Level: entry
- Type: full_time