Alchemy
Alchemy is the only complete developer platform that offers the powerful APIs, SDKs, and tools necessary to build and scale onchain apps and rollups. Our infra…
Infrastructure Engineer (Reliability Focus)
New York City
- Skills: Reliability, Observability, Infrastructure Engineer, Production Systems, AWS, Docker, Kubernetes, CI/CD, Infrastructure-as-Code, Engineering Excellence
- Experience: 5+ years of experience as an Infrastructure Engineer focused on Reliability (e.g., Site Reliability Engineer, Production Engineer, Platform Engineer)
- Type: Full-time
Rogo
We're building Al thought partners to make people smarter and more creative, accelerating the creation and sharing of knowledge in financial services. Our team…
Cloud Infrastructure Engineer
New York City
- Skills: AWS, Azure, Kubernetes, Infrastructure as Code, Terraform, Datadog, Cloud Infrastructure, CI/CD Pipelines, Linux Administration, Monitoring Tools
- Experience: 3-5 years of hands-on experience with AWS and/or Azure cloud platforms; 2-3 years of experience managing Kubernetes clusters; 3-5 years of experience with Linux system administration and shell scripting.
- Type: Full-time
Hedge Fund in NYC
A technology company focused on software development and infrastructure support.
Reliability Engineer
New York City
- Skills: Reliability Engineer, software development, infrastructure support, Python, Java, C/C++, Go, distributed software applications, monitoring, performance tuning
- Experience: Skilled in software development and infrastructure support
- Type: Full-time
Hebbia
Hebbia is AI that works the way you work, designed to be generally capable and capable of tackling complex tasks by citing answers from various sources. Its mi…
Site Reliability Engineer (SRE)
New York City
- Skills: Site Reliability Engineer, DevOps Engineer, CI/CD pipelines, cloud platforms, AWS, monitoring tools, observability, infrastructure-as-code, Docker, Kubernetes
- Experience: 4+ years software development experience
- Type: Full-time
Kontakt.io
Kontakt.io is building the platform that care operations run on, reducing waste, cutting costs, and improving revenue by enhancing throughput, asset utilizatio…
Infrastructure Leader
New York City
- Skills: Infrastructure, SRE, Security, Data Platform, cloud security, data reliability, healthcare data, EHR, HIPAA compliance, scalable infrastructure
- Experience: experienced
Palantir
Palantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empo…
Senior Software Engineer - Substrate
New York City
- Skills: Kubernetes, K8s clusters, production infrastructure, automation, self-healing systems, cloud hyperscalers, engineering rigor, operational excellence, scale, security
- Type: Full-time
Ro
Ro is a direct-to-patient healthcare company with a mission of helping patients achieve their health goals by delivering the easiest, most effective care possi…
Database Reliability Engineer (DBRE)
New York City
- Skills: Database Reliability, Site Reliability Engineering, Performance, Scalability, Availability, Reliability, Automation, Observability, Operational Excellence, Collaboration
- Experience: Strong Site Reliability Engineering (SRE) skills
- Type: Full-time
Vercel
Vercel’s Frontend Cloud provides the developer experience and infrastructure to build, scale, and secure a faster, more personalized web. Customers like Under …
SRE Manager
New York City
- Skills: SRE, Site Reliability Engineering, cloud, incident management, disaster recovery, distributed system design, engineering management, operational efficiency, service quality, technical risk management
- Experience: At least 5 years experience in an SRE role, or at least 8 years experience in an adjacent role (e.g. platform engineering), operating in a scaled environment. At least 3 years experience in engineering management.
- Type: Full-time
Gusto
Gusto is a modern, online people platform that helps small businesses take care of their teams. On top of full-service payroll, Gusto offers health insurance, …
Storage Infrastructure Engineer
New York City
- Skills: storage infrastructure, MySQL, Postgres, data streaming, Kafka, cloud platforms, AWS, Terraform, resiliency, automation
- Experience: 4+ years of experience with software development and architecture; 2+ years of experience with database technologies like MySQL or Postgres; 2+ years of experience with data streaming technologies, particularly Kafka
- Type: Full-time
Bumble Inc.
Bumble Inc. is an equal opportunity employer that encourages applications from diverse candidates, including various age groups, genders, and those with disabi…
Site Reliability Engineer (SRE)
New York City
- Skills: Site Reliability Engineering, reliability, scalability, performance, software systems, infrastructure management, automation, development, security, operations
Meta
Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Ap…
Production Engineer
New York City
- Skills: Production Engineering, DevOps Engineer, Site Reliability Engineer, UNIX, TCP/IP, Python, Kubernetes, Terraform, MySQL, Infrastructure Management
- Experience: 2+ years of experience in UNIX and TCP/IP network fundamentals, 2+ years of coding experience
- Type: Full-time
Peloton
Peloton (NASDAQ: PTON) provides Members with expert instruction and world-class content to create impactful and entertaining workout experiences for anyone, an…
Engineering Manager, Data Foundations
New York City
- Skills: data pipeline, data protection, cloud environment, data ingestions, data routing, data encryption, data observability, Kubernetes, AWS, team management
- Experience: 8+ years of software development experience
- Type: Full-time
Virtu
Virtu is a leading financial firm that leverages cutting edge technology to deliver liquidity to the global markets and innovative, transparent trading solutio…
Site Reliability Engineer
New York City
- Skills: Site Reliability Engineering, Windows/Linux, High-Stress Environments, Micro-Market Structure, Technical Agility, Financial Systems, Programming Languages, Configuration Management, SQL, Networking
- Experience: Minimum 3 years of experience working experience, preferably in a site reliability engineering role
Datadog
Datadog (NASDAQ: DDOG) is a global SaaS business, delivering a rare combination of growth and profitability. We are on a mission to break down silos and solve …
Software Engineer - Observability Infrastructure SRE
New York City
- Skills: observability, telemetry, data plane, cloud-native, performance, reliability, software engineering, programming languages, infrastructure, collaboration
- Experience: 3+ years in software engineering, running production systems at scale
- Type: Full-time
NetApp
NetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or …
Software Engineer SRE (Observability, Incident Management)
New York City
- Skills: Cloud, Software Engineering, SRE, Incident Management, Observability, Application Security, Python, Golang, DevSecOps, Virtualization
The Trade Desk
The Trade Desk is a global technology company with a mission to create a better, more open internet for everyone through principled, intelligent advertising. H…
Senior Software Engineer – Network Reliability Engineering
New York City
- Skills: network reliability, software engineering, networking protocols, Kubernetes, cloud environments, network automation, resilient systems, infrastructure-as-code, troubleshooting, DevOps
- Type: Software Engineering
Etsy, Inc.
Etsy is the global marketplace for unique and creative goods. We build, power, and evolve the tools and technologies that connect millions of entrepreneurs wit…
Engineering Manager
New York City
- Skills: Engineering Manager, SRE, Kubernetes, cloud environment, production readiness, incident analysis, team mentoring, technical strategy, service reliability, performance metrics
- Experience: At least 3 years of experience in technical team management, specifically leading SRE teams.
- Type: Full-time
DeepL
DeepL is a global communications platform powered by Language AI. Since 2017, we’ve been on a mission to break down language barriers. Our products help busine…
Radar
Radar is location infrastructure for every product and service. Companies like Vercel, Panera, and T-Mobile use Radar's geofencing SDKs and maps APIs to power …
Site Reliability Engineer, Security
New York City
- Skills: AWS, security, detection, vulnerability, infrastructure, compliance, monitoring, container management, incident response, customer feedback
- Type: Full-time
Allara Health
Allara is a telemedicine platform delivering expert, multidisciplinary healthcare for women with hormonal conditions. Allara’s comprehensive telehealth platfor…
Senior Site Reliability Engineer
New York City
- Skills: Site Reliability Engineering, cloud platforms, automation, monitoring and alerting, infrastructure as code, incident response, performance tuning, collaboration, cost-conscious practices, scalable infrastructure
- Experience: Proven experience in Site Reliability Engineering or a similar role (Platform Engineer, DevOps Engineer)
- Type: Hybrid
Arcesium LLC
Arcesium is a global financial technology firm that solves complex data-driven challenges faced by some of the world’s most sophisticated financial institution…
Lead Infrastructure Engineer - Distributed Systems
New York City
- Skills: distributed systems, infrastructure, AWS, Linux, load balancing, authentication, authorization, network protocols, HTTP, SaaS
- Experience: 5+ years of relevant infrastructure experience preferred
Sesame
Sesame believes in a future where computers are lifelike - with the ability to see, hear, and collaborate with us in ways that feel natural and human. With thi…
Backend Infrastructure Engineer
New York City
- Skills: backend, infrastructure, systems, reliability engineering, monitoring, deployments, Terraform, Kubernetes, automation, data engineering
- Type: Full-time
Vantage
Vantage is a cloud cost visibility and optimization platform, also known as a FinOps platform, helping companies manage their cloud infrastructure costs.
Senior Site Reliability Engineer
New York City
- Skills: SRE, SaaS, Kubernetes, Infrastructure as Code, Terraform, Datadog, cloud-hosted Network design, observability, scaling, production infrastructure
- Experience: 8+ years
- Type: Full time
OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the bounda…
Software Engineer, Security Observability
New York City
- Skills: software engineering, security observability, data pipelines, Python, Golang, infrastructure as code, Terraform, cloud platforms, site reliability engineering, forensic investigations
- Experience: Strong software engineering experience
- Type: Full-time
Serotonin
A leading institutional investment platform in the digital asset space, providing comprehensive infrastructure and technology for investors to manage their ent…
BTIG
BTIG is a global financial services firm specializing in institutional trading, investment banking, research and related brokerage services.
Algorithmic Trading, Senior Site Reliability Engineer (SRE)
New York City
- Skills: SRE, DevOps, infrastructure management, infrastructure-as-code, automate operational processes, monitoring, incident response, CI/CD, Python, Docker
- Experience: 3+ years of experience in SRE, DevOps, or infrastructure engineering roles
- Type: Full-time
Celonis
Celonis is the global leader in Process Mining technology and one of the fastest-growing SaaS companies worldwide. We are on a mission to unlock unprecedented …
Site Reliability Engineer
New York City
- Skills: Site Reliability Engineering, Software Engineering, Process Mining, Kubernetes, AWS, Azure, GCP, Cloud-based applications, Operational excellence, Automation
- Experience: Minimum of 5 years of experience building and maintaining cloud-based software applications with at least one public cloud platform (AWS, Azure, or GCP).
- Type: Full-time
Bombas
Bombas is a comfort focused premium basics brand with a mission to help those in need.
Senior Software Engineer, Core Infrastructure
New York City
- Skills: Core Infrastructure, TypeScript, system observability, automation, reliability, security, performance, developer workflows, cloud platforms, observability platforms
- Experience: 5+ years of overall engineering experience, with 3+ years working closely with infrastructure, platform, or developer tooling
- Type: Full-time
Betterment
Betterment is a leading, technology-driven financial services company that offers investing and retirement solutions for retail investors and investment adviso…
Senior Site Reliability Engineer
New York City
- Skills: Site Reliability Engineering, Developer Experience, AWS, CI/CD pipelines, cloud native solutions, service-level objectives, GitHub, operational excellence, mentorship, agility
- Type: Hybrid
Kalshi
Kalshi is defining a new category: prediction markets. Kalshi allows people to trade on the outcome of any events and turn any question about the future into a…
Site Reliability Engineer
New York City
- Skills: observability, reliability, cloud deployments, automation, incident response, software design, performance tuning, system architecture, debugging, high throughput
- Experience: 5+ years of experience in software engineering
- Type: Full-time
Uniswap Labs
The Uniswap Labs team is building products to unlock value through universal exchange. We envision a future where digital economies flourish, and markets are t…
Senior Site Reliability Engineer (SRE)
New York City
- Skills: reliability, performance, monitoring, cloud, automation, DevOps, architecture, scalability, incident response, best practices
- Experience: 5+ years of experience in site reliability engineering, DevOps, or a related field
- Type: Full-time
Etsy, Inc.
Etsy is the global marketplace for unique and creative goods. We build, power, and evolve the tools and technologies that connect millions of entrepreneurs wit…
Site Reliability Engineer II
New York City
- Skills: Site Reliability Engineering, infrastructure, automation, cloud environments, Linux operations, monitoring, scalable systems, Terraform, Chef, Go
- Experience: at least 2 years experience in systems/infrastructure engineering or SRE or DevOps roles
- Type: Full-time
Altana
Altana applies AI to the world's largest organized body of supply chain data to power a more resilient, secure, and sustainable model of global commerce, focus…
Senior Manager, Technical Operations & Observability
Brooklyn
- Skills: Observability, SRE, Incident Management, IT Operations, FinOps, Automation, Reliability, Cloud Platforms, Monitoring, Alerting
Ro
Ro is a direct-to-patient healthcare company offering nationwide telehealth, labs, and pharmacy services, focusing on patient-centric healthcare solutions.
VP of Infrastructure
New York City
- Skills: scalability, reliability, security, infrastructure, Site Reliability, Developer Workflow, Observability, Data Infrastructure, growth, strategy
National Grid
Every day we deliver safe and secure energy to homes, communities, and businesses. We are there when people need us the most. We connect people to the energy t…
Principal Domain Architect
Brooklyn
- Skills: AI Ops, Site Reliability Engineering, cloud architecture, AI technologies, monitoring, automation, incident response, cloud platforms, containerization, DevOps
- Experience: 5-7 years
- Type: Full-time
Farther
Farther is a rapidly growing RIA that combines expert advisors with cutting-edge technology - delivering a comprehensive, tailored wealth management experience…
Software Engineer, Test Infrastructure (SETI)
New York City
- Skills: TypeScript, JavaScript, CI/CD, automation frameworks, test infrastructure, mocking, containerization, monitoring, observability, debugging
- Type: full_time
Palantir
Palantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empo…
Forward Deployed Site Reliability Engineer
New York City
- Skills: site reliability engineering, production infrastructure, automate processes, scalable systems, network configuration, hardware setup, production issues, systems design, US Government, data-driven decisions
- Type: full_time
YUPRO Placement
Jobs for Humanity is collaborating with YUPRO Placement to build an inclusive and just employment ecosystem. We support individuals coming from all walks of li…
Systems Reliability Engineer (SRE)
New York City
- Skills: Systems Reliability Engineering, infrastructure automation, database scripting, performance tuning, monitoring tools, cloud technology, Git, Jenkins, Agile, DevOps
- Type: contract
Nominal
Nominal is a venture-backed company with offices in Los Angeles, Austin, and New York City. We build software and data solutions for organizations, testing and…
Site Reliability Engineer (SRE)
New York City
- Skills: Site Reliability Engineering, Distributed Systems, Incident Response, Release Automation, CI/CD, gRPC, PostgreSQL, Kafka, MongoDB, OLAP Databases
- Type: full_time
Perchwell
Perchwell is the modern data and workflow platform for real estate professionals and consumers.
Senior Site Reliability Engineer
New York City
- Skills: Site Reliability Engineering, AWS, Kubernetes, CI/CD, Terraform, Observability, Incident management, Disaster recovery, Event Driven System, Elasticsearch
- Type: full_time
Knock
Knock is on a mission to help products communicate with their users in a more thoughtful way. We're a remote-first Series A startup of 25 employees that believ…
Platform Engineer
New York City
- Skills: Elixir, AWS, Kubernetes, Terraform, ClickHouse, Observability, Postgres, Event-driven architectures, Scalability, Distributed tracing
- Type: full_time
Reddit
Reddit is a community of communities. It’s built on shared interests, passion, and trust and is home to the most open and authentic conversations on the intern…
Senior Site Reliability Engineer
New York City
- Skills: Site Reliability Engineering, Distributed systems, Kubernetes, Cloud systems, Prometheus, Thanos, Grafana, DevOps, Automation, High-traffic backend systems
- Type: full_time
Spotify
Spotify is a leading global platform for music streaming, offering millions of songs and podcasts to users worldwide.
Site Reliability Engineer
New York City
- Skills: infrastructure, cloud, observability, security, developer experience, production environment, developer tooling, open-source, scalability, team collaboration
- Type: full_time
Rokt
Rokt is the global leader in ecommerce, unlocking real-time relevance in the moment that matters most. Rokt’s AI Brain and ecommerce Network powers billions of…
Ava Labs
Ava Labs makes it simple to deploy high-performance solutions for Web3, led by innovations on Avalanche. Founded by Cornell computer scientists, the company pa…
Senior Platform Engineer
New York City
- Skills: CI/CD, GoLang, Rust, Kubernetes, observability, end-to-end testing, infrastructure, performance optimization, data analysis, chaos engineering
- Type: full_time
Braze
Braze is the leading customer engagement platform that empowers brands to Be Absolutely Engaging.™ Braze allows any marketer to collect and take action on any …
Senior Site Reliability Engineer
New York City
- Skills: Site Reliability Engineering, Infrastructure as Code, Terraform, Kubernetes, Kafka, Automation, Monitoring, Deployment Pipelines, Linux, Ruby
- Type: full_time
Datadog
Datadog (NASDAQ: DDOG) is a global SaaS business, delivering a rare combination of growth and profitability. We are on a mission to break down silos and solve …
Manager I, Engineering - Metrics Platform Resilience Automation
New York City
- Skills: Resilience Automation, Distributed Systems, DevOps, Cloud Infrastructure, Monitoring, Validation Testing, Load Testing Automation, Zonal Failure Resiliency, Autoscaling, Technical Leadership
- Type: full_time
Reddit
Reddit is a community of communities. It’s built on shared interests, passion, and trust and is home to the most open and authentic conversations on the intern…
Engineering Manager, Site Reliability
New York City
- Skills: Site Reliability Engineering, cloud compute infrastructure, Kubernetes, Go, Golang, AWS, GCE, distributed systems, performance metrics, incident response
- Type: full_time
Google
Google is a global technology company that specializes in Internet-related services and products, including search engines, online advertising technologies, cl…
Senior Staff Software Engineer, Site Reliability Engineering, Google Cloud
New York City
- Skills: Site Reliability Engineering, Software Development, Distributed Systems, Automation, Scaling Systems, Incident Response, System Design Consulting, Monitoring, Fault-tolerant Systems, Complexity Analysis
- Type: full_time