93 Site Reliability Engineering jobs in San Jose

NewsBreak

NewsBreak is redefining the way users interact with local news and their communities. By bridging local users, local content creators, and local businesses, ou…

Software Engineer in Reliability & Availability

Mountain View

Skills: AWS, Kubernetes (EKS), EMR (Elastic MapReduce), service reliability, fault-tolerant architectures, Infrastructure-as-Code (IaC), CI/CD pipelines, monitoring tools (Prometheus, Grafana), high-availability strategies, incident response
Level: mid
Type: full_time

View details

Luma AI

Site Reliability Engineer (SRE)

Palo Alto

Skills: Site Reliability Engineer, SRE, Infrastructure, GPU clusters, H100 GPUs, Monitoring tools, Management tools, Performance problems, Maintenance problems, Data Processing
Level: mid
Type: full_time

View details

Site Reliability Engineer (SRE)

Replit

Replit is the fastest way to turn ideas into software. With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural languag…

Site Reliability Engineer

Foster City

Skills: Site Reliability Engineering, SRE, Infrastructure Automation, Monitoring Solutions, Infrastructure as Code, CI/CD Pipelines, Incident Management, Performance Optimization, Distributed Systems, Cloud-native Technologies
Level: mid
Type: full_time

View details

Coupang

Coupang is a leading force in South Korean commerce, known for its exceptional customer service and innovative approach to retail and e-commerce. The company b…

Observability Engineer

Mountain View

Skills: observability solutions, monitoring, alerting, logging, tracing, Kubernetes, DevOps, SRE practices, cloud-based infrastructure, performance indicators
Level: mid
Type: full_time

View details

Palo Alto Networks

Palo Alto Networks is a cybersecurity company that offers advanced firewalls and cloud-based security services to secure the digital transformation.

Manager, Site Reliability Engineering (Cortex, Tools and Platforms)

Santa Clara

Skills: DevOps, Site Reliability Engineering, Cortex, Security, Engineering Management, Cloud, Platforms, Production Operations, AI, Software Development
Level: mid
Type: full_time

View details

NetApp

NetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or …

Software Engineer SRE (Observability, Incident Management)

San Jose

Skills: Cloud, Software Engineering, SRE, Incident Management, Observability, Application Security, Python, Golang, DevSecOps, Virtualization
Level: mid
Type: full_time

View details

Celonis

Celonis helps some of the world’s largest and most esteemed brands make processes work for people, companies, and the planet. With over 5,000 enterprise custom…

Site Reliability Engineer

Redwood City

Skills: Site Reliability Engineering, SRE principles, observability, automation, incident prevention, cloud platforms, Java, Python, Kubernetes, error budgets
Level: senior
Type: full_time

View details

Rubrik

Rubrik (NYSE: RBRK) is on a mission to secure the world’s data. With Zero Trust Data Security™, we help organizations achieve business resilience against cyber…

Site Reliability Engineer

Palo Alto

Skills: Site Reliability Engineering, Relational Databases, SQL, Kubernetes, Golang, Python, Java, Scalability, Disaster Recovery, FedRAMP
Level: mid
Type: full_time

View details

Anomali

Anomali is headquartered in Silicon Valley and is the Leading AI-Powered Security Operations Platform that is modernizing security operations. At the center of…

Senior DevOps Engineer/SRE

Redwood City

Skills: Kubernetes, Terraform, CI/CD, AWS, New Relic, Python, Golang, EKS, Automation, Infrastructure as Code
Level: mid
Type: full_time

View details

Glean

Glean is an innovative AI-powered knowledge management platform designed to help organizations quickly find, organize, and share information across their teams.

Senior Site Reliability Engineer

Palo Alto

Skills: Site Reliability Engineering, Cloud Infrastructure, Automation, Monitoring, Docker, Kubernetes, Google Cloud Platform, AWS, Terraform, Performance Optimization
Level: senior
Type: full_time

View details

Ridgeline

Ridgeline is the industry cloud platform for investment management. Founded by visionary tech entrepreneur Dave Duffield to solve operational business challeng…

Senior Staff Software Engineer - Site Reliability Engineering

San Ramon

Skills: Site Reliability Engineering, cloud-native, FinOps, AI-assisted automation, observability infrastructure, Infrastructure-as-Code, CI/CD systems, incident triage, zero-downtime deployments, cost visibility
Level: senior
Type: full_time

View details

Google

Software Engineering Manager II, Site Reliability Engineering

Sunnyvale

Skills: Site Reliability Engineering, software development, large-scale systems, distributed systems, automation, performance, scalability, team management, technical leadership, complex challenges
Level: mid
Type: full_time

View details

Contextual AI

Machine Learning Infrastructure Engineer

Mountain View

Skills: Machine Learning, Infrastructure, Distributed Systems, Cloud Infrastructure, Observability, Python, Kubernetes, Terraform, CI/CD, Reliability Engineering
Level: mid
Type: full_time

View details

xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motiv…

Site Reliability Storage Engineer

Palo Alto

Skills: site reliability engineering, exascale storage systems, data management, Kubernetes, security measures, Rust, Go, cloud infrastructure, Terraform, AI research
Level: mid
Type: full_time

View details

Data Center Site Reliability Engineer (SRE)

Celonis

Celonis, the global leader in Process Mining technology, aims to unlock productivity by placing data and intelligence at the core of business processes.

Site Reliability Engineer

Redwood City

Skills: Process Mining, Site Reliability Engineering, Cloud-based Applications, Kubernetes, Java, Python, AWS, Azure, GCP, SRE Principles
Level: mid
Type: full_time

View details

Crusoe

Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated, purpose-built AI infrastructure solution…

Senior Site Reliability Engineer, Storage

Sunnyvale

Skills: Site Reliability Engineer, Cloud Infrastructure, Distributed Storage Systems, Automation, Performance Tuning, Fault-tolerant Systems, I/O Subsystems, Kubernetes, Infrastructure as Code, AI Workloads
Level: mid
Type: full_time

View details

Tenable, Inc.

Tenable® is the Exposure Management company. 44,000 organizations around the globe rely on Tenable to understand and reduce cyber risk. Our global employees su…

Staff Site Reliability Engineer - FedRAMP

San Jose

Skills: Site Reliability Engineering, Terraform, FedRAMP, AWS, Kubernetes, Docker, Agile, Cloud Infrastructure, Microservices, Security
Level: mid
Type: full_time

View details

Hippocratic AI

Hippocratic AI has developed a safety-focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically improve healthca…

Senior Site Reliability Engineer

Palo Alto

Skills: GCP, Kubernetes, infrastructure automation, Docker, Terraform, monitoring, Jenkins, cloud platforms, DevOps, security compliance
Level: mid
Type: full_time

View details

Senior Site Reliability Engineer (GCP / Kubernetes)

Sustainable Talent

Sustainable Talent is partnering with Nvidia, a global leader in transforming computer graphics, PC gaming, and accelerated computing for over 25 years.

Platform Reliability & Lab Support Engineer

Santa Clara

Skills: Platform Reliability, Lab Support, Cloud Infrastructure, Data Centers, DevOps, Software Validation, Unix, Windows, Networking, Automation
Level: mid
Type: full_time

View details

Etched

Etched is building AI chips that are hard-coded for individual model architectures. Our first product (Sohu) only supports transformers, but has an order of ma…

Infrastructure Software Engineer

San Jose

Skills: ASIC, HPC, Infrastructure-as-Code, CI/CD, Telemetry, Prometheus, Kubernetes, Cloud, Artificial Intelligence, Observability
Level: mid
Type: full_time

View details

93 Site Reliability Engineering jobs in San Jose.

NewsBreak

Luma AI

Replit

Coupang

Palo Alto Networks

NetApp

Celonis

Rubrik

Anomali

Glean

Ridgeline

Google

Contextual AI

xAI

Celonis

Crusoe

Tenable, Inc.

Hippocratic AI

Sustainable Talent

Etched