|
Ampcus Inc. is a certified global provider of a broad range of Technology and Business consulting services. We are in search of a highly motivated candidate to join our talented Team. Job Title: Observability & Monitoring Engineer Location(s): Fort Mill, SC Job Description We are seeking a highly skilled Observability & Monitoring Engineer with deep hands on experience in Dynatrace, Grafana, and modern telemetry best practices. This role is responsible for designing, implementing, and optimizing end to end observability solutions that enhance reliability, reduce MTTD/MTTR, and provide actionable insights into application and infrastructure performance. The ideal candidate has strong technical expertise across metrics, logs, traces, and synthetics, along with the ability to collaborate with engineering, SRE, DevOps, and architecture teams to elevate monitoring maturity. Key Responsibilities Monitoring, Observability & Telemetry
- Design, implement, and maintain full stack observability using Dynatrace, including distributed tracing, Real User Monitoring (RUM), synthetics, custom events, and dashboards.
- Build high quality Grafana dashboards using Prometheus, Loki, Tempo, or other data sources to visualize service health and business KPIs.
- Define and enforce telemetry standards across metrics, logs, traces, and events to ensure high signal to noise ratio and consistent instrumentation.
- Develop and maintain SLOs, SLIs, Error Budgets, and reliability scorecards for critical business services.
- Configure and optimize alert thresholds, alert routing, and auto remediation workflows to minimize noise and improve MTTD/MTTR.
Engineering & Automation
- Automate monitoring setup and configuration using scripts, APIs, IaC (Terraform), or Dynatrace Configuration as Code.
- Create synthetic monitoring scripts and custom metrics ingestions.
- Build reusable monitoring templates for services, APIs, user journeys, and infrastructure components.
- Implement correlation ID frameworks and end to end transaction tracing across microservices and cloud environments.
Performance Engineering
- Conduct performance analysis using Dynatrace PurePath, flame graphs, and execution traces.
- Diagnose memory leaks, thread contention, CPU anomalies, network issues, and dependency bottlenecks.
- Partner with AppDev teams to embed observability during the SDLC.
Cross Team Collaboration
- Work closely with SRE, Platform Engineering, AppDev, and Cloud teams to enhance reliability and availability.
- Provide technical guidance on telemetry design, instrumentation patterns, and observability adoption.
- Present insights, trends, and recommendations to senior leadership.
Required Technical Skills Dynatrace Expertise (Hands On)
- Dynatrace OneAgent deployment/configuration (Linux/Windows/Kubernetes).
- Deep experience with:
- Distributed tracing (PurePath).
- RUM & Synthetic Monitoring.
- Custom metrics ingestion (via API/StatsD/OpenTelemetry).
- Problem detection, Davis AI, anomalies, baselining.
- DQL (Dynatrace Query Language).
- Dashboards & Notebooks.
- SLO configuration.
- Experience with Dynatrace Managed or SaaS environments.
Grafana Expertise
- Building advanced dashboards using:
- Grafana, Dynatrace, Elk.
- CloudWatch.
- Strong query skills (DQL, LogQL, SQL or Elastic DSL).
- Experience configuring alert rules, contact points, and alerting pipelines.
Telemetry & Observability Best Practices
- Strong understanding of OpenTelemetry (OTel) specification and instrumentation.
- Knowledge of telemetry pipelines: collectors, processors, exporters.
- Expertise in:
- Metrics cardinality management.
- Log enrichment & structured logging.
- Distributed tracing design.
- Business transaction tracing.
- Sampling & retention strategies.
- Experience standardizing observability across microservices and hybrid environments.
Monitoring & Reliability
- Hands on experience with:
- Cloud-native monitoring stacks (EKS, AKS, GKE).
- Logging systems: Splunk, ELK, Loki, Datadog Logs (any).
- Ability to create SLO/SLI models aligned with business objectives.
- Strong understanding of SRE principles and operational excellence KPIs.
- API and Webhook automation for alerts and dashboard provisioning.
Preferred Qualifications
- 5+ years in Observability, SRE, Performance Engineering, or Monitoring roles.
- Dynatrace certification or Grafana Observability Stack certification.
- Experience building correlation ID standards and E2E trace stitching.
- Strong communication skills for leadership reporting and technical documentation.
Soft Skills
- Strong analytical and troubleshooting mindset.
- Ability to influence teams and drive observability adoption.
- Clear communicator who can translate technical insights into business impact.
- Ownership mindset and commitment to reliability and performance excellence.
Ampcus is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, protected veterans or individuals with disabilities.
|