Detecting and Troubleshooting Broken Datasets and Monitors

Sometimes dataset transformations (ongoing or backfill) can be suspended due to system issues or performance problems. Detecting these suspensions early will help you prevent unexpected data gaps.


Types of Transforms

  • Ongoing – processes new/real-time data.

  • Backfill – reprocesses historical data.

Each type can suspend independently, so when reporting an issue, always specify:

  • The type of transform (ongoing vs. backfill)

  • Its status


Monitoring for Suspended Transforms

You can detect suspended datasets using this OPAL query against the System Datastream:

make_col schema:string(EXTRA.schema)
filter schema = "transformer_suspended_transform"
make_col suspension_reason:string(FIELDS.suspension_reason)
make_col dataset_id:int64(FIELDS.dataset_id)
rename_col timestamp:BUNDLE_TIMESTAMP
set_link ^Dataset, dataset_id:@Observe_Dataset.dataset_id
pick_col timestamp, suspension_reason, dataset_id

This will return:

  • Timestamp – when the suspension occurred

  • Suspension reason – why it was suspended

  • Dataset ID – link back to the dataset


What to Do if Your Dataset Is Broken

If you discover a dataset suspension:

  1. Check the suspension reason.

  2. Refer to the decision table below to see if it’s something you can fix yourself.

  3. If support is required, create a support ticket with:

    • Dataset name

    • Time of failure

    • Description of the issue


Suspension Reasons – Decision Table

Suspension Reason Can Fix Yourself? Suggested Action
Ongoing acceleration is timing out :white_check_mark: (maybe) Review transform definition, simplify queries, or contact Support.
High compile times :white_check_mark: (maybe) Simplify OPAL, reduce query complexity, or contact Support.
Internal Snowflake errors (3000XX) :prohibited: Open a Support ticket with full error + timestamp.
Admin-suspended :prohibited: Contact Support for more info.

More Details

Self-Fixable Issues

  • Ongoing acceleration is timing out

    • The dataset stopped processing new incoming data due to performance issues.

    • Check and optimize the transform query (see Performance Cookbook in Observability Cloud docs).

  • High compile times

    • The transform query exceeded compile-time limits.

    • Simplify OPAL expressions, reduce nesting, or refactor queries.

Support-Required Issues

  • Acceleration runs into: ISE…

  • SQL execution internal error: 3000XX…

  • Admin console: …

These are caused by system-level or vendor-related errors. Contact Support with the error details.

Tested out the “suspended transforms” query in my account, and I’m seeing a couple of entries over the last 30 days with the following suspension_reason

acceleration stuck with error: statement 0 failed: 002003 (42S02): SQL compilation error: Object ‘DOES_NOT_EXIST’ does not exist or not authorized. (EXECUTION)

I know we’ve deleted a few datasets over that same period, is this maybe a normal thing that happens when the acceleration is stopped when a dataset is deleted, or something like that?

I don’t think we’ve had any issues with any of the datasets that are still in use, so I’m assuming this is safe to ignore.

Looking further, the error was due to merge transforms taking place on 2 of the datasets.

Upon double-checking these, they can be safely ignored as no acceleration jobs are stuck and had continue successfully automatically after that error took place!