Monitoring CrashLoopBackOff Events with the Kubernetes Entity Dataset

When a Kubernetes pod enters a CrashLoopBackOff state, it’s a clear sign that something is failing repeatedly during container startup. These restarts can impact system stability and performance if not quickly detected.
Observe’s Kubernetes Entity dataset lets you automatically surface these events, visualize trends, and create alerting monitors to notify your team as soon as crash loops occur.

:puzzle_piece: Prerequisites

Make sure:

  • You have the Observe Agent deployed and configured to collect Kubernetes metrics and events.
  • The Kubernetes Entity dataset (part of the Kubernetes Explorer) is available in your Observe workspace.

:gear: Step 1: Build a Query to Identify CrashLoopBackOff Pods

In your Observe workspace:

  1. Open Kubernetes Explorer → Kubernetes Entity.
  2. Add the following extractions to your query:
  • extract string(facets.status)
  • extract string(identifiers.clusterName)
  • extract string(identifiers.namespaceName)
  • extract string(identifiers.podName)
  1. Apply a filter for:
status = "CrashLoopBackOff"

You should now see only pods currently or recently in the CrashLoopBackOff state.


:bar_chart: Step 2: Visualize Trends Over Time

  1. In the chart view:
  • Set X-axis_c_valid_from (timestamp of the record)
  • Set Y-axisA_KubernetesEntity_count (count of entities)
  1. Choose “Count Values of all Events” over time.
  2. Group by:
  • clusterName
  • namespaceName
  • podName

This visualization shows the number of pods entering CrashLoopBackOff over time, color-coded by pod or namespace.
You can easily spot recurring issues or clusters experiencing the most instability.

(Refer to the first screenshot — pods with crash loops are plotted per cluster and namespace.)


:vertical_traffic_light: Step 3: Create a Monitor for CrashLoopBackOff

Now let’s turn this query into an automated Threshold Monitor.

  1. Click “Create Monitor” (top right of the query view).
  2. Choose Type: Threshold.
  3. In the Monitor Query section:
  • Keep the same dataset and filters:
status = "CrashLoopBackOff"
  • Set function → Count Values of all Events
  • Group by → clusterName, namespaceName, podName
  • Resolution → 1 hour

Your monitor should look similar to the example in the second screenshot — it evaluates the count of CrashLoopBackOff pods every hour.


:balance_scale: Step 4: Configure the Threshold Rule

  1. Under Evaluation Settings, click Add Rule.
  2. Define your trigger, for example:
When A_KubernetesEntity_count > 0

→ Trigger alert if any pod in CrashLoopBackOff is detected.
3. Set evaluation frequency (e.g., every 5 minutes).
4. Choose your notification destination (Slack, PagerDuty, email, etc.).