Arkose Labs
Arkose Labs protects enterprises from cybercrime and abuse, offering the world's first $1M warranties for credential stuffing and SMS toll fraud. They have a s…
Senior Director of Engineering
San Mateo
- Skills: Platform Engineering, Infrastructure, Site Reliability, Cloud Infrastructure, Incident Response, AWS, Azure, Distributed Systems, CI/CD, Infrastructure-as-Code
- Experience: 5+ years of leadership experience in Platform, Infrastructure, SRE, or related fields; 10+ years of experience in software engineering.
- Type: Full-time
Neuralink
We are creating devices that enable a bi-directional interface with the brain. These devices allow us to restore movement to the paralyzed, restore sight to th…
Infrastructure Team Member
Fremont
- Skills: software engineering, cloud architecture, infrastructure, networking protocols, Linux systems, hybrid cloud, security fundamentals, IAC tools, cryptographic protocols, systems administration
- Experience: Experience building hybrid cloud/on-prem infrastructure, software engineering skills, and system administration experience.
- Type: Full-time
NewsBreak
NewsBreak is redefining the way users interact with local news and their communities. By bridging local users, local content creators, and local businesses, ou…
Software Engineer in Reliability & Availability
Mountain View
- Skills: AWS, Kubernetes (EKS), EMR (Elastic MapReduce), service reliability, fault-tolerant architectures, Infrastructure-as-Code (IaC), CI/CD pipelines, monitoring tools (Prometheus, Grafana), high-availability strategies, incident response
- Experience: 2+ years in SRE, DevOps, or Infrastructure Engineering roles
- Type: Full-time
Luma AI
Site Reliability Engineer (SRE)
Palo Alto
- Skills: Site Reliability Engineer, SRE, Infrastructure, GPU clusters, H100 GPUs, Monitoring tools, Management tools, Performance problems, Maintenance problems, Data Processing
Xero
Xero helps businesses by automating routine tasks and connecting them with the right data, advisors, and apps, ultimately contributing to a stronger economy.
Team Lead of Product SRE
San Mateo
- Skills: Product SRE, SRE engineers, reliability, Observability, high performing services, Engineering, high performing teams, Product SRE strategy, transformation, expert communicator
- Experience: Strong Engineering background, deep experience in SRE
Zoox
Staff Technical Operations Engineer
Foster City
- Skills: IT Technical Operations, real-time command center, monitoring services, Site Reliability Engineering (SRE), Technical Operations Engineering, stability, live robot missions, strategic initiatives, innovative solutions, reliability and performance
Replit
Replit is the fastest way to turn ideas into software. With our powerful AI-powered Agent and Assistant, anyone can create and launch apps from natural languag…
Site Reliability Engineer
Foster City
- Skills: Site Reliability Engineering, SRE, Infrastructure Automation, Monitoring Solutions, Infrastructure as Code, CI/CD Pipelines, Incident Management, Performance Optimization, Distributed Systems, Cloud-native Technologies
- Experience: 3+ years of experience in Site Reliability Engineering or similar roles (DevOps, Systems Engineering, Infrastructure Engineering)
- Type: Full-Time
Robinhood Markets
Robinhood Markets was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood…
Staff Software Engineer - Reliability
Menlo Park
- Skills: reliability, scalability, performance, security, software engineering, distributed systems, incident metrics, operational excellence, mentoring, infrastructure
- Experience: 8+ years experience in designing, building, and maintaining large-scale, distributed systems
- Type: Full-time
Meta
Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Ap…
Production Engineer
Menlo Park
- Skills: Production Engineering, DevOps Engineer, Site Reliability Engineer, UNIX, TCP/IP, Python, Kubernetes, Terraform, MySQL, Infrastructure Management
- Experience: 2+ years of experience in UNIX and TCP/IP network fundamentals, 2+ years of coding experience
- Type: Full-time
Coupang
Coupang is a leading force in South Korean commerce, known for its exceptional customer service and innovative approach to retail and e-commerce. The company b…
Observability Engineer
Mountain View
- Skills: observability solutions, monitoring, alerting, logging, tracing, Kubernetes, DevOps, SRE practices, cloud-based infrastructure, performance indicators
- Experience: Strong experience in implementing and managing observability solutions in large-scale, complex environments.
- Type: Full-time
Palo Alto Networks
Palo Alto Networks is a cybersecurity company that offers advanced firewalls and cloud-based security services to secure the digital transformation.
Manager, Site Reliability Engineering (Cortex, Tools and Platforms)
Santa Clara
- Skills: DevOps, Site Reliability Engineering, Cortex, Security, Engineering Management, Cloud, Platforms, Production Operations, AI, Software Development
- Type: Full-time
NetApp
NetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or …
Software Engineer SRE (Observability, Incident Management)
San Jose
- Skills: Cloud, Software Engineering, SRE, Incident Management, Observability, Application Security, Python, Golang, DevSecOps, Virtualization
Glean
We’re on a mission to make knowledge work faster and more humane. We believe that AI will fundamentally transform how people work.
Senior Site Reliability Engineer (SRE)
Palo Alto
- Skills: SRE, cloud infrastructure, automation, monitoring, incident management, performance optimization, scalability, security compliance, software development, cloud platforms
- Experience: 8+ years of experience in a senior-level role within Site Reliability Engineering or similar role
- Type: Full-time
Visa
Transform global payment systems through automation and innovation.
Middleware Reliability Engineer
Foster City
- Skills: automation, infrastructure as code, DevOps, observability, CI/CD, Terraform, Ansible, Python, Java, Go
- Type: Hybrid
Palo Alto Networks
Palo Alto Networks is a cybersecurity company that aims to redefine protection and security in the digital age. Their mission is to be the cybersecurity partne…
Sr Site Reliability Engineer (App Service Team)
Santa Clara
- Skills: Site Reliability Engineering, DevOps, cloud-native applications, AWS, GCP, Terraform, Kubernetes, automation, programming languages, CI/CD
- Experience: 4+ years as an engineer in Infrastructure, Operations, DevOps, or System Engineering; 2+ years building high availability, scalable cloud-native applications on AWS and GCP
- Type: Full-time
Hippocratic AI
Hippocratic AI has developed a safety-focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically improve healthca…
Senior Site Reliability Engineer (GCP / Kubernetes)
Palo Alto
- Skills: infrastructure automation, deployment pipelines, monitoring, scalable systems, cloud platforms, Kubernetes, Terraform, Ansible, Jenkins, security compliance
- Experience: At least 5 years of professional experience in DevOps engineering or a related field
- Type: Full time
Luma AI
Senior Software Engineer - Reliability
Palo Alto
- Skills: SRE, GPU, infrastructure, monitoring, cloud providers, automation, scalability, containerization, observability, problem-solving
- Experience: 5+ years
- Type: Full-time
Palo Alto Networks
Senior Staff DevOps Engineer
Santa Clara
- Skills: DevOps, SRE, Cloud infrastructure, Automation, Terraform, Kubernetes, GitLab CI/CD, Monitoring, Security, Reliability
Sustainable Talent
Sustainable Talent is a staffing agency partnered with Nvidia, focusing on providing talent for tech roles in infrastructure and data centers.
Platform Reliability & Lab Support Engineer
Santa Clara
- Skills: Infrastructure, Data Centers, Hardware, Software, Networking, Troubleshooting, DevOps, Maintenance, Collaboration, Testing
- Experience: 4+ years of equivalent experience in a Lab or Datacenter environment.
- Type: Full-time
Celonis
Celonis helps some of the world’s largest and most esteemed brands make processes work for people, companies and the planet. With over 5,000 enterprise custome…
Site Reliability Engineer
Redwood City
- Skills: Site Reliability Engineering, Microservices, Kubernetes, Automation, Incident management, Cloud computing, Java, Python, Observability, CI/CD
- Experience: Minimum of 5 years of experience building and maintaining cloud-based software applications.
- Type: Full-time
Box
Box (NYSE:BOX) is the leader in Intelligent Content Management. Our platform enables organizations to fuel collaboration, manage the entire content lifecycle, …
Site Reliability Engineer
Redwood City
- Skills: SRE, reliability, scalability, cloud-native, Kubernetes, AWS, GCP, observability, automation, distributed systems
- Experience: 5+ years of working experience designing, developing, and operating large-scale, customer-facing products or services
- Type: Full-time
Celonis
Celonis helps some of the world’s largest and most esteemed brands make processes work for people, companies, and the planet. With over 5,000 enterprise custom…
Site Reliability Engineer
Redwood City
- Skills: Site Reliability Engineering, SRE principles, observability, automation, incident prevention, cloud platforms, Java, Python, Kubernetes, error budgets
- Experience: Minimum of 8+ years of experience in software engineering or SRE roles.
- Type: Full-time
Robinhood Markets
Robinhood Markets was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood…
Staff Software Engineer - Reliability Engineering
Menlo Park
- Skills: reliability, scalability, performance, security, distributed systems, programming languages, Linux, networking, incident metrics, monitoring
- Experience: 8+ years
- Type: Full-time
Splunk
Splunk, a Cisco company, is building a safer and more resilient digital world with an end-to-end full stack platform made for a hybrid, multi-cloud world. Lead…
Senior Site Reliability Engineer, Observability, FedRAMP
San Jose
- Skills: Cloud, AWS, Kubernetes, Docker, automation, system administrator, SaaS, monitoring, logging, resilience
- Experience: Extensive experience as a Linux system administrator supporting enterprise computing platforms and systems.
- Type: Hybrid Remote
GoodLeap
GoodLeap is a technology company delivering best-in-class financing and software products for sustainable solutions, from solar panels and batteries to energy-…
Site Reliability Engineer
San Mateo
- Skills: Site Reliability Engineer, software engineering, system engineering, automation, monitoring, incident response, infrastructure management, DevOps, observability, AWS
- Type: Full Time
Aerospike
Aerospike, a leader in next-generation, always-on, hyperscale data solutions, enables extreme-scale, real-time applications for various industry leaders.
Performance & Reliability Engineer
Mountain View
- Skills: performance engineering, reliability, distributed systems, database concepts, performance tuning, Linux/Unix, observability tools, problem-solving, collaboration, communication
- Experience: Experience with distributed systems or large-scale services, preferably in a production setting.
- Type: Full-time
Personalis, Inc
Personalis is transforming the active management of cancer through breakthrough personalized testing, focusing on cancer management and patient care.
Senior Software Engineer
Fremont
- Skills: software engineering, LIMS, CI/CD pipelines, Python, Java, PostgreSQL, MySQL, Flask, Django, site reliability engineering
- Experience: 5+ years of experience in software engineering, site reliability engineering, and/or devops.
- Type: Full-time
Intuit
Intuit is the global financial technology platform that powers prosperity for the people and communities we serve. With approximately 100 million customers wor…
Staff Software Engineer
Mountain View
- Skills: Kubernetes, AWS, DevOps, Platform Engineering, Reliability Engineering, Cloud Architecture, Automation, Observability, Incident Management, Data Analysis
- Experience: 7+ years
- Type: Full-time
Coupang
A fastest-growing retail company, disrupting the commerce industry from South Korea, combining startup culture with large global resources.
Zoox
Zoox is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the…
Senior Site Reliability Engineer
Foster City
- Skills: Site Reliability Engineering, Autonomous Vehicles, Microservice Architecture, Kubernetes, Data Pipelines, Performance Metrics, Linux, Python, C/C++, AWS
- Experience: 2+ years
- Type: Full-time
Reliable Robotics
Building safety-enhancing technology for aviation to improve safety, convenience, and transformation of air transportation.
Site Reliability Engineer (SRE)
Mountain View
- Skills: support, monitoring, infrastructure, tools, deploying, systems safety, technology, automation, supporting, improvement
Rubrik
Rubrik (NYSE: RBRK) is on a mission to secure the world’s data. With Zero Trust Data Security™, we help organizations achieve business resilience against cyber…
Site Reliability Engineer
Palo Alto
- Skills: Site Reliability Engineering, Relational Databases, SQL, Kubernetes, Golang, Python, Java, Scalability, Disaster Recovery, FedRAMP
- Type: full_time
Wayve
Founded in 2017, Wayve is the leading developer of Embodied AI technology. Our advanced AI software and foundation models enable vehicles to perceive, understa…
Software Engineer
Sunnyvale
- Skills: Site Reliability Engineering, Python, C++, Rust, Cloud Computing, CI/CD, Containerization, Monitoring, Troubleshooting, Autonomous Vehicles
- Type: full_time
Veza
Veza is the identity security company. Identity and security teams use Veza to secure identity access across SaaS apps, on-prem apps, data systems, and cloud i…
Site Reliability Engineer
Redwood City
- Skills: Site Reliability Engineering, Cloud Automation, Kubernetes, Terraform, AWS, Monitoring Tools, Incident Response, Technical Documentation, Customer Technical Support, GitOps
- Type: full_time
Augment Code
The best software comes from Augmenting developers, not replacing them. We’re bringing joy back to software engineering and keeping developers in flow by build…
Software Engineer, SRE
Palo Alto
- Skills: Kubernetes, GCP, Linux, monitoring, containers, cloud infrastructure, Go, Shell, Jsonnet, AI products
- Type: full_time
Productiv
Productiv started with a vision to transform the way enterprises manage and optimize their software portfolios. With a focus on driving efficiency and transpar…
Senior Platform Engineer
Palo Alto
- Skills: AWS, Terraform, Infrastructure as Code, Cloud-native, Observability, DevOps, Monitoring, CI/CD, Automation, Scalability
- Type: full_time
Google
Google is a global technology company known for its search engine, online advertising technologies, cloud computing, software, and hardware.
Senior Software Developer, Site Reliability Development
Sunnyvale
- Skills: Site Reliability Development, distributed systems, automation, software development, system design, operational health, capacity planning, incident response, technical leadership, programming languages
- Type: full_time
Anomali
Anomali is headquartered in Silicon Valley and is the Leading AI-Powered Security Operations Platform that is modernizing security operations. At the center of…
Senior DevOps Engineer/SRE
Redwood City
- Skills: Kubernetes, Terraform, CI/CD, AWS, New Relic, Python, Golang, EKS, Automation, Infrastructure as Code
- Type: other
PayNearMe
PayNearMe develops technology to facilitate the end-to-end customer payment experience, making it easy for businesses to accept, disburse and manage payments. …
Site Reliability Engineer
Santa Clara
- Skills: Site Reliability Engineering, Infrastructure Management, Kubernetes, Terraform, Monitoring, Automation, CI/CD, Cloud Platforms, Scripting, Observability
- Type: full_time
Crusoe
Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated, purpose-built AI infrastructure solution…
Production Engineer, Storage
Sunnyvale
- Skills: AI-first, cloud infrastructure, distributed storage systems, SRE, automation, reliability, NVMe, SSD, Infrastructure as Code, Kubernetes
- Type: full_time
Palo Alto Networks
At Palo Alto Networks® everything starts and ends with our mission: Being the cybersecurity partner of choice, protecting our digital way of life. Our vision i…
Site Reliability Engineer
Santa Clara
- Skills: Kubernetes, Docker, GCP, AWS, Ansible, Terraform, Python, Go, CI/CD, DevOps
- Type: full_time
Qualys
Qualys is a provider of cloud security, compliance, and related services to organizations worldwide.
Database Reliability Engineer (DBRE)
Foster City
- Skills: Oracle, Elasticsearch, Cassandra, Kafka, Redis, Ceph, performance tuning, database optimization, cloud security, data synchronization
- Type: full_time
Visa
Senior Director of Site Reliability Engineering
Foster City
- Skills: Site Reliability Engineering, DevSecOps, Automation, Kubernetes, Terraform, Ansible, CI/CD, Cloud Technologies, Incident Management, Agile Methodologies
- Type: full_time
Box
Box is the leader in Intelligent Content Management. Our platform enables organizations to fuel collaboration, manage the entire content lifecycle, secure crit…
Senior Software Engineer, SRE
Redwood City
- Skills: observability, AIOps, incident response, automation, cloud operations, microservices, ML models, APIs, scalable systems, system design
- Type: full_time
Super Micro Computer
Supermicro® is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyp…
Site Reliability Engineer - AI Cloud
San Jose
- Skills: Site Reliability Engineering, AI Cloud, Infrastructure as Code, Kubernetes, GPU Clusters, Terraform, Ansible, DevOps, Observability, CI/CD
- Type: full_time
Google
Google Cloud's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information an…
Senior Staff Software Engineer Technical Lead, Reliability Engineering
Sunnyvale
- Skills: Kubernetes, Site Reliability Engineering, distributed systems, data structures, algorithm design, technical leadership, cloud technologies, operations engineering, machine learning, AI
- Type: full_time
Zoox
Zoox is a robotics company focused on developing services critical to the development process for autonomous vehicles.
Platform/Site Reliability Engineer
Foster City
- Skills: platform reliability, site reliability, autonomous vehicles, data processing pipelines, fault-tolerant systems, data handling, compute-intensive tasks, CPU, GPU, infrastructure automation
- Type: full_time
Intuit Credit Karma
Intuit Credit Karma is a mission-driven company, focused on championing financial progress for our more than 140 million members globally. While we're best kno…
Database Reliability Engineer
Oakland
- Skills: MySQL, GCP, automation, spanner, Terraform, reliability, cloud infrastructure, SRE methodologies, database architecture, data recovery
- Type: full_time
ServiceNow
It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today …
Xero
Xero is here to help businesses by automating routine tasks, surfacing actionable insights and connecting businesses with the right data, advisors, and apps.
Product SRE Engineer
San Mateo
- Skills: SRE, observability, reliability, technical proficiency, high performing services, communication skills, transformational culture, embedded product team, experience in SRE, data insights
- Type: full_time
Robinhood Markets
Robinhood Markets was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood…
Senior Software Engineer - Reliability
Menlo Park
- Skills: large-scale systems, distributed systems, Python, Go, C++, Linux, networking, incident metrics, Reliability Engineering, AWS
- Type: full_time
Box
Box (NYSE:BOX) is the leader in Intelligent Content Management. Our platform enables organizations to fuel collaboration, manage the entire content lifecycle, …
Senior Software Engineer, SRE
Redwood City
- Skills: SRE, Observability, AIOps, Microservices, Automation, Incident response, Cloud operations, AI/ML, APIs, Distributed systems
- Type: full_time
Zscaler
Serving thousands of enterprise customers around the world including 45% of Fortune 500 companies, Zscaler (NASDAQ: ZS) was founded in 2007 with a mission to m…
Sr. Staff Technical Program Manager
San Jose
- Skills: Technical Program Management, Site Reliability Engineering, Operational excellence, Observability tooling, SRE, Synthetic monitoring, Availability program, Cloud security, Automation, Data visualization
- Type: full_time
Obsidian Security
Founded in 2017, Obsidian Security was created to close a critical gap: securing the SaaS applications where modern business happens—platforms like Microsoft 3…
Senior Site Reliability Engineer
Palo Alto
- Skills: SaaS security, DevOps, Site Reliability Engineering, Kubernetes, AWS, GCP, CI/CD, Helm, Microservices, Monitoring
- Type: full_time
Celonis
Celonis makes processes work for people, companies and the planet. The Celonis Process Intelligence Platform uses industry-leading process mining and AI techno…
Reliability Engineer
Redwood City
- Skills: Process Mining, Reliability Engineering, Site Reliability Engineering (SRE), Cloud-based Applications, FedRAMP-compliant, Incident Management, Kubernetes, AWS, Azure, GCP
- Type: full_time
PayNearMe
PayNearMe develops technology to facilitate the end-to-end customer payment experience, making it easy for businesses to accept, disburse and manage payments. …
Data Reliability Engineer
Santa Clara
- Skills: DataOps, Site Reliability Engineering, AWS, Snowflake, CI/CD, IaC, Datadog, Monte Carlo, Fivetran, Python
- Type: full_time
Google
Google is a global technology company specializing in Internet-related services and products.
Staff Software Engineer, Site Reliability Engineering, Production Scopes
Sunnyvale
- Skills: Site Reliability Engineering, API designs, failure domains, technical guidance, mentorship, operational excellence, software development, production systems, infrastructure automation, large-scale system design
- Type: full_time
Glean
Glean is an innovative AI-powered knowledge management platform designed to help organizations quickly find, organize, and share information across their teams.
Senior Site Reliability Engineer
Palo Alto
- Skills: Site Reliability Engineering, Cloud Infrastructure, Automation, Monitoring, Docker, Kubernetes, Google Cloud Platform, AWS, Terraform, Performance Optimization
- Type: full_time
Intuit
Intuit is the global financial technology platform that powers prosperity for the people and communities we serve. With approximately 100 million customers wor…
Manager 2, Software Engineering
Mountain View
- Skills: AWS cloud, site reliability, operational excellence, incident management, availability, scalability, resiliency, Agile software development, database management, AI assisted tools
- Type: full_time