Staff Site Reliability Engineer
Company Profile
Logicbroker provides modern dropship and marketplace solutions that connect retailers and brands to Connect, Orchestrate, and Grow their commerce platform to take control of their customer experience. Through curated expanded assortment, flexible integration tools, and automated onboarding procedures, Logicbroker clients enjoy unmatched speed-to-market capabilities. We work with mid-market and Enterprise manufacturers and retailers across a number of verticals including Health & Wellness, Home Improvement, Consumer Electronics, Toys & Babies, and Consumer Packaged Goods and service brands such as Samsung, Victoria’s Secret, Ace Hardware and more.
Position – Staff Site Reliability Engineer
As a Staff Site Reliability Engineer (SRE), you will lead the design and evolution of Logicbroker’s cloud infrastructure, observability, reliability, and performance frameworks. You’ll collaborate with engineering teams to build highly available, fault-tolerant services while driving automation, incident response improvements, and cloud cost optimizations. You will act as a technical authority on SRE best practices, mentor engineers, and ensure our systems meet and exceed customer SLAs.
Primary Duties and Responsibilities:
- Define and implement SLOs, SLIs, and error budgets across platform services.
- Architect and improve high-availability infrastructure, focusing on resiliency, self-healing, and scalability.
- Design disaster recovery (DR) and multi-region failover strategies to ensure business continuity.
- Establish performance and load testing frameworks to validate system scalability.
- Drive automation-first principles for infrastructure provisioning, deployments, and incident response (IaC, GitOps).
- Build automated recovery mechanisms and reduce toil by developing runbooks, scripts, and playbooks.
- Collaborate with engineering teams to design resilient microservices and event-driven systems.
- Optimize cloud costs and resource utilization while maintaining performance standards.
- Evolve the monitoring, logging, and tracing
- Lead post-incident reviews (PIRs), ensuring actionable insights and prevention of recurrence.
- Proactively detect reliability risks through synthetic monitoring and anomaly detection.
- Build real-time dashboards and alerts that give clear, actionable visibility into system health.
- Mentor and coach engineers on SRE best practices, fostering a reliability-focused culture.
- Lead cross-functional war rooms and incident response exercises.
- Contribute to hiring and onboarding of SRE and Platform engineers.
- Partner with engineering leadership to define long-term reliability strategies and roadmaps.
Essential Skills, Experience & Education
- Bachelor’s degree in Computer Science, Engineering, or related technical field.
- 10+ years of experience in software, DevOps, or SRE roles
- Expertise in cloud-native architectures (AWS, GCP, or Azure), including networking, load balancing, and scaling patterns.
- Strong proficiency with infrastructure-as-code tools (Terraform, CloudFormation, or Pulumi).
- Advanced experience with containers and orchestration (Docker, Kubernetes).
- Solid understanding of distributed systems, event-driven architectures, and fault tolerance patterns.
- Expertise with CI/CD pipelines, blue/green deployments, and zero-downtime strategies.
- Strong observability skills, with hands-on experience using tools like DataDog, Prometheus, Grafana, ELK, or OpenTelemetry.
- Experience conducting capacity planning, chaos engineering, and performance optimization.
- Proven ability to lead incident response efforts and guide root cause analysis.
- Familiarity with service meshes (Istio, Linkerd) and advanced Kubernetes networking.
- Experience with chaos engineering platforms (Gremlin, Litmus).
- Exposure to security hardening, compliance (SOC 2), and data privacy in distributed systems.
Compensation and Environment
Logicbroker offers a comprehensive retirement plan, workspace compensation, and more! Logicbroker embraces a truly remote work model. All Logicbroker employees will receive thorough training to kick-start their career with the organization, and help provide the tools needed to build a successful home workspace.
Logicbroker is an equal opportunity employer.