Sr. Site Reliability Engineer

Company Profile

Logicbroker provides modern dropship and marketplace solutions that connect retailers and brands to Connect, Orchestrate, and Grow their commerce platform to take control of their customer experience. Through curated expanded assortment, flexible integration tools, and automated onboarding procedures, Logicbroker clients enjoy unmatched speed-to-market capabilities. We work with mid-market and Enterprise manufacturers and retailers across a number of verticals including Health & Wellness, Home Improvement, Consumer Electronics, Toys & Babies, and Consumer Packaged Goods and service brands such as Samsung, Victoria’s Secret, Ace Hardware and more.

Position – Sr. Site Reliability Engineer

As a Senior SRE at Logicbroker, you’ll be responsible for ensuring the availability, scalability, and performance of our Intelligent Order Network. You’ll help implement best-in-class observability, deploy automation to eliminate toil, and work with engineering teams to improve service reliability through better design, testing, and incident management.

Primary Duties and Responsibilities:
  • Develop and manage observability solutions (e.g. DataDog, Prometheus, Grafana, etc).
  • Define, monitor, and enforce service-level objectives (SLOs) and indicators (SLIs).
  • Collaborate with engineers to review system designs for reliability and scalability.
  • Automate provisioning, deployment, and recovery using Terraform and CI/CD pipelines.
  • Build self-healing systems and improve automated incident detection.
  • Participate in and improve the on-call process and incident response.
  • Conduct root cause analysis and implement postmortem action items.
  • Design and validate disaster recovery (DR) and high-availability strategies.
  • Optimize cloud spend and infrastructure utilization.
  • Create and maintain runbooks, documentation, and operational playbooks.
  • Implement tools for chaos testing, load testing, and resiliency verification.
  • Drive adoption of secure-by-default infrastructure practices.
  • Help developers integrate observability and reliability early in the dev lifecycle.
  • Lead infrastructure and reliability roadmap efforts with engineering and product teams.
  • Evaluate and onboard new tools to improve system uptime, visibility, and speed.

Essential Skills, Experience & Education

  • 5–8+ years of experience in SRE, DevOps, or infrastructure engineering.
  • Deep understanding of cloud infrastructure (AWS, GCP, or Azure).
  • Proficiency with infrastructure-as-code (Terraform, CloudFormation, Pulumi).
  • Strong experience with Kubernetes and container orchestration at scale.
  • Familiarity with monitoring tools (Datadog, Prometheus, Grafana, etc.).
  • Proficiency in scripting languages like Python, Bash, or Go.
  • Knowledge of CI/CD systems and deployment automation.
  • Experience with incident response processes and tooling.
  • Ability to conduct performance analysis and implement tuning at the system level.
  • Understanding of network-level troubleshooting and load balancing concepts.
  • Experience building scalable, highly available systems in production.
  • Solid grasp of system security, IAM, secrets management, and encryption.
  • Experience conducting capacity planning and stress testing.
  • Strong communicator who can coordinate across development and infrastructure teams.
  • Bonus: Experience with service mesh, chaos engineering, or multi-region systems.

Compensation and Environment

Logicbroker offers a comprehensive retirement plan, workspace compensation, and more! Logicbroker embraces a truly remote work model. All Logicbroker employees will receive thorough training to kick-start their career with the organization, and help provide the tools needed to build a successful home workspace.

Logicbroker is an equal opportunity employer.

Apply Today!

Apply today

Send us your resume at [email protected]

Apply now