Best Jobs

Site Reliability Engineer – Remote Jobs in USA | Delfos HR Career

You will work across client’s application and infrastructure capabilities to design and deliver the observability systems necessary to operate dozens of airlines.

Responsible for building the core telemetry, event processing, querying, alerting, and visualization fabric underpinning the operations of our client’s PaaS and SaaS offerings. You will work closely with Engineering, Operations, and our customers to design solutions to deliver delightful and effective observability.

Responsibilities include:

Collaborate closely with the Observability Product Owner to progress the strategy and roadmap for clients Observability platform, guiding the technical design of new capabilities, services, APIs, and tools.
Lead and mentor the observability team and other development teams to grow in observability expertise and influence broad adoption of OpenTelemetry.
Keep the observability offerings relevant by reviewing competitors, vendors, and industry trends in the observability, telemetry, stream processing, and monitoring space.
Evaluate trade-off decisions across the customer experience, development, and operational support in partnership with teams.
Participate in designing and coding solutions

Requirements:

Experience working on telemetry, large-scale event stream processing, or monitoring systems, and applications for infrastructure optimization and technical troubleshooting.
Experience working cross-functionally with Engineering, IT, Operations, Product and Program Management.
Technical design and development building enterprise cloud/SaaS/PaaS products/services.
Expertise managing high-volume or critical production service environments.
Passionate about collaborating with engineering teams to drive innovation and change.
Excellent verbal and written communication and presentation skills.
Ability to work in a distributed team throughout the world.
Deliver work in iterative Agile framework (e.g. SCRUM), collaborating with a team to capture and refine user stories and development plans to achieve sprint goals.
Analytical and problem-solving skills with passion to find the root cause.
6-8+ years of professional software engineering experience in C#, .NET, Java.
3+ years of cloud operational experience Azure, AWS, or GCP.
Experience in Observability and APM technologies like Splunk, Azure Monitor, Dynatrace, Honeycomb, Cribl, Logz.io, Lightstep, Datadog, New Relic, & Rookout.
Strong knowledge of Kubernetes, Windows, and Linux based environments.

Related Articles

Back to top button