Skip to main content

Freshness Validations

Freshness Validations in Rudol detect issues related to when tables are updated. They ensure that data arrives on time, follows expected update patterns, and does not silently become stale or behave erratically.

Rudol supports two approaches for Freshness Validations:

A table can have multiple Freshness Validations, but each validation uses only one approach.

Creating a Freshness Validation

A Freshness Validation monitors the timestamp of the last update of one or more tables. Rudol retrieves this information directly from the datasource metadata and, in most cases, considers both content changes (DML) and structural changes (DDL) as valid updates.

Create a Freshness Validation

Assign a clear, descriptive name, for example:

  • “Orders – Daily Load Freshness”
  • “Marketing Tables – SLA Compliance”

Names should clearly express the expected update behavior.

Selecting Assets

Freshness Validations support the same asset selection model as other validations:

  • Single Table
  • Multiple Tables
  • Live Rules (dynamic selection based on name, domain, technology, tags, owner)

Live Rules are evaluated at execution time. Tables that start matching are automatically included; tables that stop matching are excluded.

Evaluation method

Freshness behavior is defined by selecting one of the following evaluation methods.

Known Schedule

Use Known Schedule when update times are well defined and tied to ETL or ingestion SLAs. An update window defines when a table is expected to be updated.

Define update windows

Each window includes:

  • Day of the week
  • Start time
  • Tolerance in minutes

All times are evaluated in UTC.

The tolerance is applied after the configured start time:

Example

  • Monday at 23:30 with 60 minutes tolerance
  • Valid window: Monday 23:30 → Tuesday 00:30 (UTC)

You can define multiple windows for the same validation. If a table updates multiple times within a valid window, the window is considered fulfilled and no incident is created.

Rudol creates a Data Quality Incident when either of the following occurs:

  1. An update window ends and the table has not been updated.
  2. The table is updated outside of any valid update window.

AI Pattern Detection

Use AI Pattern Detection when update behavior is variable or not strictly scheduled.

AI Pattern Detection Parameters

In this mode, Rudol applies anomaly detection to the metric time between updates, learning normal behavior over time. The metric is defined as:

time_between_updates = timestamp(update_n) - timestamp(update_n-1)
  • Measured in seconds
  • Calculated every time Rudol detects a change in the table’s last update timestamp
  • The first update is ignored; training starts from the second update onward

Examples

  • Table updates at 10:00 and 10:15 → time between updates = 900 seconds
  • Table updates at 01:00 and 05:00 → time between updates = 14,400 seconds

Both unusually long and unusually short intervals can be considered anomalies.

Training Mode

Once created, the validation enters Training Mode. Rudol collects historical samples to learn normal behavior.

  • Requires ~500 samples over at least 3 weeks.
  • The threshold is not fixed; it depends on execution frequency and data stability.
  • You must choose frequency carefully to avoid excessively long training periods.
  • When enough samples are gathered, the validation transitions automatically to Prediction Mode.

Execution Frequency

Volume Validations can run:

  • Hourly
  • Daily
  • Weekly
  • Monthly
  • Using a custom CRON expression

The chosen frequency impacts how fast the model collects training samples. All assigned assets, including those selected through Live Rules, use the same frequency.

Interpreting Validation Results

Freshness Validation Chart

Each validation displays a time-series chart:

  • Purple line: observed time between updates
  • Grey band: AI-generated expected range
  • Red dots: detected anomalies that generate incidents

The Y-axis is rendered dynamically (seconds, minutes, hours, or days) depending on magnitude.

Each spike represents a detected update; its height reflects the elapsed time since the previous update.

Understanding Anomalies

An anomaly indicates that the observed time between updates falls outside the expected range—either:

  • The table updated too late, or
  • The table updated too frequently

In both cases, Rudol creates a Data Quality Incident and sends alerts to subscribed users and configured integrations.

Alerts and Notifications

When an anomaly is detected alerts are sent automatically to:

  • All users subscribed to the validation (email + in-app notifications)
  • Configured integration channels: Slack, Microsoft Teams, Google Chat

This ensures immediate visibility across operational and analytics teams.

Handling Spikes

If a sudden change is known to be valid, like a backfill or a batch reprocessing, you can mark the anomaly as “Not an incident” from the Incident view.

This action:

  • Feeds corrective feedback to the AI model
  • Helps the system adjust expectations faster
  • Avoids repeated false positives in the future

There is no need to manually reset the training process: Rudol retrains continuously as it accumulates more data.