Mert Tosun
← Posts
Distributed Tracing in Go Services with OpenTelemetry

Distributed Tracing in Go Services with OpenTelemetry

Mert TosunObservability

In microservice architectures, logs alone rarely explain where latency actually comes from. A single user request travels across gateway, multiple services, queues, and databases. Distributed tracing solves this by connecting all spans into one request timeline.

OpenTelemetry is the standard way to implement this in Go.

Why tracing is essential

Metrics tell you that something is wrong; traces tell you where and why.

Typical debugging questions:

  • Which hop increased p95 latency?
  • Is delay in auth service, DB, or external API?
  • Which endpoint is causing retry storms?

Data flow

Client Request
   -> Gateway span
      -> Service A span
         -> Service B span
            -> DB span

Each span shares a trace context so the full path is visible in one place.

Production practices

  1. Propagate context on every outbound call.
  2. Add domain attributes (tenant, operation, error class).
  3. Use parent-based sampling with sensible ratio.
  4. Export traces to a backend that supports search and retention.

Sampling strategy

Full sampling is often too expensive at scale. Common model:

  • low baseline sampling in normal traffic
  • higher sampling for errors and slow requests
  • short high-sampling windows during incidents

This balances observability depth with cost control.

Anti-patterns

  • creating spans without context propagation
  • high-cardinality tags on every span
  • relying only on traces without metrics correlation

Conclusion

OpenTelemetry tracing in Go provides operational clarity that logs and metrics alone cannot. With proper context propagation and sampling design, teams can reduce mean time to detection and recovery while keeping telemetry cost predictable.