Observability

Using DataDog, OpenTelemetry, and Splunk to answer the question: Why is the system behaving this way? Moving beyond basic monitoring to true understanding.

"If It Moves, Measure It"

In a distributed microservices environment, you cannot debug with `ssh` and `grep`. You need a holistic view of a request's journey across the entire stack. Observability is not just about alerting when things break; it's about understanding the internal state of the system based on its external outputs.

I implement the Three Pillars of Observability (Metrics, Logs, Traces) correlated together. This allows us to jump from a spike in error rate (Metric) to the specific requests failing (Trace) and the detailed error messages (Logs) in seconds.

The Stack

OpenTelemetry (OTel)
Vendor-neutral standard for instrumentation. Future-proofing our telemetry data pipelines.
DataDog
Unified platform for APM, Metrics, and Logs. Excellent for visual correlation and dashboards.
Splunk
Deep log analysis and security information and event management (SIEM).