[Field Notes] Troubleshooting .NET OpenTelemetry SDK: Why Can’t I see My App’s Metrics?

2 minute read | Suggest an edit | Issue? Question?

Challenge

I have a .NET app that runs in a Linux container. I was instrumenting this app with OpenTelemetry to push metrics to an OpenTelemetry Collector container (“otel-collector”). Locally, I have a docker-compose setup that spins up the otel-collector, Prometheus, and Grafana.

I had everything set up as I remembered it working (and where I had past working examples). But for some reason, nothing seemed to be hitting the otel-collector. I enabled the console exporter for OpenTelemetry metrics, and saw metrics being generated by the app.

I couldn’t figure out how to understand what was happening between the time the .NET OTel SDK generated the metric and when it should arrive to the otel-collector. I enabled verbose logging in my otel-collector config:

exporters:
  debug:
    verbosity: detailed

But I still saw nothing.

OpenTelemetry .NET SDK Self-Diagnosis to the Rescue

Fortunately, I found this troubleshooting document that describes how to get some OTel SDK self-diagnostics in place.

  • Open a terminal session in the container – this should open in the app’s working directory (/app in my case)
  • echo '{"LogDirectory":".","FileSize":32768,"LogLevel":"Verbose"}' >> OTEL_DIAGNOSTICS.json creates a JSON file that tells the SDK to output diagnostics

At this point, I saw a file called dotnet.572.log appear. I had diagnostics!

The Real Problem: A Breaking Change I’d Missed

Now that I had a diagnostic log, I could easily see the issue:

2025-01-10T03:50:19.5070301Z:Exporter failed send data to collector to {0} endpoint. Data will not be sent. Exception: {1}{http://host.docker.internal:4317/}{Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error starting gRPC call. HttpRequestException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake. (InvalidResponse) HttpIOException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake. (InvalidResponse) HttpIOException: The response ended prematurely while waiting for the next frame from the server. (ResponseEnded)", DebugException="System.Net.Http.HttpRequestException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake. (InvalidResponse)")

Well, that’s not great! However, a quick search led me to a GitHub issue that could help: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/33896

In that issue, it was noted that as of v0.104.0 of the OTel Collector, I’d have to bind to all network interfaces in order to enable my past behavior. There was a blog post and everything. I totally missed it!

The Solution

The proposed solution was fine with me – this particular implementation is something I only care about in a local dev environment.

In my otel-collector.yaml settings file, I changed my receivers from the default:

receivers:
  otlp:
    protocols:
      grpc:
      http:

to:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

This bound to all network addresses. Lo and behold, with a restart of the otel-collector, I saw the verbose logs in .NET now indicate that metrics were being published, and I saw the verbose otel-collector logs showing that metrics were now being received. I was also able to see the metrics in Prometheus.

Leave a comment