[Field Notes] Troubleshooting .NET OpenTelemetry SDK: Why Can’t I see My App’s Metrics?
Challenge
I have a .NET app that runs in a Linux container. I was instrumenting this app with OpenTelemetry to push metrics to an OpenTelemetry Collector container (“otel-collector”). Locally, I have a docker-compose setup that spins up the otel-collector, Prometheus, and Grafana.
I had everything set up as I remembered it working (and where I had past working examples). But for some reason, nothing seemed to be hitting the otel-collector. I enabled the console exporter for OpenTelemetry metrics, and saw metrics being generated by the app.
I couldn’t figure out how to understand what was happening between the time the .NET OTel SDK generated the metric and when it should arrive to the otel-collector. I enabled verbose logging in my otel-collector config:
exporters:
debug:
verbosity: detailed
But I still saw nothing.
OpenTelemetry .NET SDK Self-Diagnosis to the Rescue
Fortunately, I found this troubleshooting document that describes how to get some OTel SDK self-diagnostics in place.
- Open a terminal session in the container – this should open in the app’s working directory (
/app
in my case) echo '{"LogDirectory":".","FileSize":32768,"LogLevel":"Verbose"}' >> OTEL_DIAGNOSTICS.json
creates a JSON file that tells the SDK to output diagnostics
At this point, I saw a file called dotnet.572.log
appear. I had diagnostics!
The Real Problem: A Breaking Change I’d Missed
Now that I had a diagnostic log, I could easily see the issue:
2025-01-10T03:50:19.5070301Z:Exporter failed send data to collector to {0} endpoint. Data will not be sent. Exception: {1}{http://host.docker.internal:4317/}{Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error starting gRPC call. HttpRequestException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake. (InvalidResponse) HttpIOException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake. (InvalidResponse) HttpIOException: The response ended prematurely while waiting for the next frame from the server. (ResponseEnded)", DebugException="System.Net.Http.HttpRequestException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake. (InvalidResponse)")
Well, that’s not great! However, a quick search led me to a GitHub issue that could help: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/33896
In that issue, it was noted that as of v0.104.0 of the OTel Collector, I’d have to bind to all network interfaces in order to enable my past behavior. There was a blog post and everything. I totally missed it!
The Solution
The proposed solution was fine with me – this particular implementation is something I only care about in a local dev environment.
In my otel-collector.yaml
settings file, I changed my receivers from the default:
receivers:
otlp:
protocols:
grpc:
http:
to:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
This bound to all network addresses. Lo and behold, with a restart of the otel-collector, I saw the verbose logs in .NET now indicate that metrics were being published, and I saw the verbose otel-collector logs showing that metrics were now being received. I was also able to see the metrics in Prometheus.
Leave a comment