Typical Tasks
- Conduct discovery workshops to identify critical business KPIs and technical service level indicators.
- Deploy and configure agents or OpenTelemetry collectors across hybrid cloud environments.
- Develop customized dashboards that visualize distributed tracing and system performance.
- Refine alerting logic to minimize noise and prioritize high-priority production incidents.
- Document technical architectures and provide hands-on training for client operations teams.
Responsibilities
- Participate on the end-to-end communication with client stakeholders regarding project milestones and technical requirements.
- Define observability implementation roadmap that align with the client’s specific infrastructure. requirements and budget.
- Support and participate on successful integration of logging, metrics, and tracing into existing CI/CD pipelines.
- Ensure the reliability and performance of the monitoring stack to maintain constant visibility.
- Standardize telemetry collection methods across diverse application teams and departments.
Technical Skills
- Expertise in commercial platforms like Datadog, Dynatrace, New Relic, or Splunk.
- Proficiency in open-source stacks including Prometheus, Grafana, and the ELK stack.
- Advanced knowledge of Kubernetes, container orchestration, and serverless cloud architectures.
- Strong command of Infrastructure as Code tools such as Terraform or CloudFormation.
- Ability to write and debug code in languages like Python, Go, or Java for instrumentation.