Filter and Sample
Sometimes we come across tables that contain a large number of rows, and performing validations on these tables would take a considerable amount of time. To avoid this waste of time, we could perform this validation on a portion of the table, on a representative sample of the entire data population. rudol allows you to take a subset of rows from the table to apply validations. There are three mutually exclusive ways to take a subset of rows from the entire table:
Filter by dimension
Take a sample
andOptimize for append-only tables
To use one of these options, when creating a new Validation Group you should check the Run on a portion of the table
checkbox.
And then the aforementioned options will appear
Filter by dimension
This option allows you to filter the table rows that contain a given value in a column. This way the validations will only be executed on the rows that fulfill this condition. This feature is also helpful when you need to define different validations for different subsets of rows that belong to the same table.
For example, suppose we have the following table
ID | ANIMAL | AGE |
---|---|---|
1 | DOG | 3 |
2 | CAT | 3 |
3 | COW | 4 |
and we only need to validate the rows with animals that are 3 years old, we simply choose the AGE
column
in the Reference Column
and put the number 3 as value in Expected Value
then the resulting subset where the validations will be executed will be:
ID | ANIMAL | AGE |
---|---|---|
1 | DOG | 3 |
2 | CAT | 3 |