How can I troubleshoot missing CPU Usage and Memory Usage in Kubernetes Explorer when using the Observe K8 Agent?

When monitoring CPU and memory usage in a containerized environment, you might encounter a situation where these metrics are missing, accompanied by an error indicating a failure to connect to the /stats/summary endpoint. If this connection fails, your Kubernetes Explorer dashboards may show incomplete or no data. This article outlines steps to diagnose and resolve this issue effectively.

Understanding the /stats/summary Endpoint

The /stats/summary endpoint is part of the container runtime’s API and provides a summary of resource usage (CPU, memory, etc.) for running containers. In Kubernetes, the Metrics Server queries this endpoint via the kubelet on each node to aggregate and serve metrics. A failure to connect here often points to issues with the kubelet, container runtime, network, or configuration.

Troubleshooting Steps

  1. Test the /stats/summary Endpoint Directly

    a. On the node, query the endpoint:

    curl http://localhost:10250/stats/summary
    
            Port 10250 is the default read-only kubelet port.
    
  2. Check for connection errors in your Observe Log Explorer under the Kubernetes Explorer logs.

    image

Generally you can filter your search for failed, here’s an example entry for error connection refused:

2025-03-28T16:27:07.133Z error kubeletstatsreceiver/scraper.go:103 call to /stats/summary endpoint failed {"kind": "receiver", "name": "kubeletstats", "data_type": "metrics", "error": "Get "https://<K8S_NODE_NAME>:10250/stats/summary": dial tcp 127.0.1.1:10250: connect: connection refused"}

You may also encounter no such host:

2025-03-28T16:27:07.133Z error kubeletstatsreceiver/scraper.go:103 call to /stats/summary endpoint failed {"otelcol.component.id": "kubeletstats", "otelcol.component.kind": "Receiver", "otelcol.signal": "metrics", "error": "Get \"https://<K8S_NODE_NAME>:10250/stats/summary\": dial tcp: lookup <NODE_NAME> on <IP_ADDRESS>: no such host"}

If testing the /stats/summary endpoint is successful, update your helm agent deployment to set node.kubeletstats.useNodeIp to true:

helm repo update 
helm upgrade --namespace=observe observe-agent observe/agent --reuse-values --set node.kubeletstats.useNodeIp=true

You can also update the values.yaml by including this block:

node:
   kubeletstats:
      useNodeIp: true

This will update the Agent to use the Nodes status.hostIP in the event the Node name resolves to a different IP address or loopback address.

1 Like