Purple Digital

Job details

Employment types:

Contract

Location:

Europe (remote)

Salary / Rate:

€40/hour

Language

English

Eligibilty / Employment Arrangement:

Job posted by

James Butler

Get in touch

Contact our team today to discuss the role.

Location: 100% Remote (Europe-based)

Preferred Markets: Poland, Bulgaria, Kosovo, North Macedonia, Ukraine, Romania, or Türkiye

Eligibility: Must be located in Europe with valid work permissions. No sponsorship or visa support available.

Language: English (C1+)

‍

The Role

As a Site Reliability Engineer (SRE), you act as the critical bridge between software development and operations. Your mission is to enable "reliable speed" for our clients, empowering them to leverage the full benefits of continuous deployment without compromising customer experience.

You will embed with multidisciplinary teams in a DevOps environment, ensuring a laser focus on production stability while building the facilities required to maintain it.

We are looking for a technical leader who can mobilise and motivate teams. In this role, you will be the go-to expert for determining production robustness, defining reliable deployment procedures, analysing failure scenarios, and engineering solutions to mitigate them.

Key Responsibilities

Reliability Engineering: collaborate with product and engineering teams to define and implement SLIs and SLOs.
Observability: Design and build comprehensive systems for observability to ensure deep visibility into application health.
Failure Analysis: Lead the analysis of failure scenarios and develop potential mitigations.
Resilience: Create and maintain runbooks to remediate or proactively prevent failure scenarios.
Toil Reduction: Identify and automate repetitive work that does not add value, freeing up time for engineering challenges.
Incident Management: Participate in and facilitate incident management processes, including rotation in on-call duty.

Qualifications & Experience

Essential Background

‍Communication: Excellent command of English (C1 or above) with strong assertive communication skills.
Experience: 5+ years in Software Engineering, DevOps, QA, or Cloud Engineering, with at least 2 years specifically as a dedicated Site Reliability Engineer
Leadership: Proven ability to take the lead, make decisions, and coach development teams to make architectural choices that favour reliability.
Context: Experience working in large corporate environments and international contexts involving both onshore and offshore teams.

Technical Expertise

Cloud & Infrastructure: Basic to intermediate knowledge of serverless services in public clouds (AWS, Azure, GCP). Deep experience with AWS is highly preferred.
Containerisation: Extensive knowledge of microservices architecture, specifically Docker and Kubernetes/EKS.
CI/CD & GitOps: Experience with pipelining tools (GitHub Actions, Azure DevOps, GitLab, Jenkins) and specifically ArgoCD.
Observability: Expert-level knowledge of monitoring systems, particularly APM tools like Datadog, New Relic, Dynatrace, Prometheus, and Grafana.
Development: Strong programming and scripting skills. Familiarity with Java/Springboot environments is a significant plus.
Data Streaming: Familiarity with Kafka.
OpenTelemetry: Experience implementing OpenTelemetry for standardized tracing, metrics, and logging across microservices and cloud environments.

Operational Focus

Experience managing incidents in a high-traffic, 24x7 public-facing production environment.
Background working with highly available eCommerce platforms.
Strong conceptual understanding of software architecture and systems thinking.

‍

Email To Apply