Observability
Using DataDog, OpenTelemetry, and Splunk to answer the question: Why is the system behaving this way? Moving beyond basic monitoring to true understanding.
"If It Moves, Measure It"
In a distributed microservices environment, you cannot debug with `ssh` and `grep`. You need a holistic view of a request's journey across the entire stack. Observability is not just about alerting when things break; it's about understanding the internal state of the system based on its external outputs.
I implement the Three Pillars of Observability (Metrics, Logs, Traces) correlated together. This allows us to jump from a spike in error rate (Metric) to the specific requests failing (Trace) and the detailed error messages (Logs) in seconds.
The Stack
- OpenTelemetry (OTel)Vendor-neutral standard for instrumentation. Future-proofing our telemetry data pipelines.
- DataDogUnified platform for APM, Metrics, and Logs. Excellent for visual correlation and dashboards.
- SplunkDeep log analysis and security information and event management (SIEM).
The Pillars
Distributed Tracing
Visualizing the waterfall of a request across services to identify bottlenecks (e.g., "Why did this API call take 2s?").
Structured Logging
Logs must be machine-readable (JSON). No more parsing regex. We log context (userID, requestID) automatically.
High-Cardinality Metrics
Ability to slice and dice metrics by tags.
Related Projects

CI/CD & Deployment Automation
Enterprise CI/CD pipelines with automated security scanning and canary deployments.

Enterprise Network Systems
ISO 9001-certified IT operations and infrastructure management.

AT&T Network Infrastructure
Carrier-grade network migration operations and tooling (AT&T).