Skip to main content

Filter and Sample

Sometimes we come across tables that contain a large number of rows, and performing validations on these tables would take a considerable amount of time. To avoid this waste of time, we could perform this validation on a portion of the table, on a representative sample of the entire data population. rudol allows you to take a subset of rows from the table to apply validations. There are three mutually exclusive ways to take a subset of rows from the entire table:

  • Filter by dimension
  • Take a sample and
  • Optimize for append-only tables

To use one of these options, when creating a new Validation Group you should check the Run on a portion of the table checkbox.

Filter and Sample

And then the aforementioned options will appear

Filter by dimension

This option allows you to filter the table rows that contain a given value in a column. This way the validations will only be executed on the rows that fulfill this condition. This feature is also helpful when you need to define different validations for different subsets of rows that belong to the same table.

For example, suppose we have the following table

IDANIMALAGE
1DOG3
2CAT3
3COW4

and we only need to validate the rows with animals that are 3 years old, we simply choose the AGE column in the Reference Column and put the number 3 as value in Expected Value

Filter and Sample

then the resulting subset where the validations will be executed will be:

IDANIMALAGE
1DOG3
2CAT3