Distributed Tracing in Go Services with OpenTelemetry
In microservice architectures, logs alone rarely explain where latency actually comes from. A single user request travels across gateway, multiple services, queues, and databases. Distributed tracing solves this by connecting all spans into one request timeline.
OpenTelemetry is the standard way to implement this in Go.
Why tracing is essential
Metrics tell you that something is wrong; traces tell you where and why.
Typical debugging questions:
- Which hop increased p95 latency?
- Is delay in auth service, DB, or external API?
- Which endpoint is causing retry storms?
Data flow
Client Request
-> Gateway span
-> Service A span
-> Service B span
-> DB span
Each span shares a trace context so the full path is visible in one place.
Production practices
- Propagate context on every outbound call.
- Add domain attributes (tenant, operation, error class).
- Use parent-based sampling with sensible ratio.
- Export traces to a backend that supports search and retention.
Sampling strategy
Full sampling is often too expensive at scale. Common model:
- low baseline sampling in normal traffic
- higher sampling for errors and slow requests
- short high-sampling windows during incidents
This balances observability depth with cost control.
Anti-patterns
- creating spans without context propagation
- high-cardinality tags on every span
- relying only on traces without metrics correlation
Conclusion
OpenTelemetry tracing in Go provides operational clarity that logs and metrics alone cannot. With proper context propagation and sampling design, teams can reduce mean time to detection and recovery while keeping telemetry cost predictable.
Related posts
SLO, SLI, and Error Budget: Operating Service Reliability
A practical SRE playbook for choosing user-centric indicators, setting realistic objectives, and using error budgets in release decisions.
gRPC vs REST: When Should You Use Which? A Comparative Guide with Go
gRPC and REST in microservices: protobuf, HTTP/2, browser constraints, and Go examples — complements our Go vs Node.js service comparison.
Kubernetes HPA, VPA, and Cluster Autoscaler: Using Them Together Correctly
A practical production guide for combining HPA, VPA, and Cluster Autoscaler without conflict.