Set of values

Validations to check that column values belong to a defined domain. They differ in whether they operate row by row or on the set of distinct values, and in the direction of the containment check.

Validation	What it checks	Operates on
Column values must be in set	Every row has a valid value	Each row individually
Column distinct values must be in set	No unexpected category exists	Unique values only
Column distinct values must contain set	All expected categories are present	Unique values only
Column distinct values must be equal to set	Categories match exactly — no more, no less	Unique values only
Column most common value must be in set	The most frequent value is one of the expected ones	Aggregated

Column values must be in set

Every row must contain a value from the given set. Fails if any individual row has a value outside the set.

Use this when you want to enforce a domain at the row level — for example, a status column that should only contain active, inactive, or pending.

Parameters

Name	Type	Required	Description
Column	Column	✅	The column to validate.
Value set	Array	✅	Allowed values, comma-separated. For example: `active,inactive,pending`

Example

order_id	status
1	active
2	inactive
3	archived

	Example 1	Example 2
Column	`status`	`status`
Value set	`active,inactive,pending`	`active,inactive,pending,archived`
Result	❌ Fails	✅ Passes
Reason	`archived` in row 3 is not in the allowed set	All values in the column are in the allowed set

Column distinct values must be in set

The set of distinct values found in the column must be contained in the given set. Fails if any category appears in the data that is not in the allowed set — regardless of how many rows have that value.

Use this when you want to detect category drift — for example, a new payment method appearing in your data that your pipeline does not know how to handle.

Difference from "Column values must be in set"

Column values must be in set checks every individual row and reports how many rows fail. This validation only checks which unique values exist in the column — it does not report a row count.

Parameters

Name	Type	Required	Description
Column	Column	✅	The column to validate.
Value set	Array	✅	Allowed distinct values, comma-separated. For example: `credit_card,bank_transfer,cash`

Example

payment_id	method
1	credit_card
2	bank_transfer
3	credit_card
4	crypto

	Example 1	Example 2
Column	`method`	`method`
Value set	`credit_card,bank_transfer,cash`	`credit_card,bank_transfer,cash,crypto`
Result	❌ Fails	✅ Passes
Reason	`crypto` is not in the allowed set	All distinct values in the column are in the allowed set

Column distinct values must contain set

The given set must be fully contained in the column's distinct values. Fails if any value from the expected set is missing from the column.

Use this when you need to guarantee that all expected categories are present in your data — for example, ensuring a report covers all regions or all product lines.

Difference from "Column distinct values must be in set"

This validation checks the opposite direction: the column must contain the given set, but can have additional values. "Must be in set" checks that the column values are contained within the given set.

Parameters

Name	Type	Required	Description
Column	Column	✅	The column to validate.
Value set	Array	✅	Expected values that must be present, comma-separated. For example: `north,south,east,west`

Example

sale_id	region
1	north
2	north
3	east

	Example 1	Example 2
Column	`region`	`region`
Value set	`north,south,east,west`	`north,east`
Result	❌ Fails	✅ Passes
Reason	`south` and `west` are missing from the column	All values in the set are present in the column

Column distinct values must be equal to set

The set of distinct values in the column must exactly match the given set — no extra values, no missing values.

Use this when you need strict control over the full domain of a column — for example, an enum-like field where you know every possible value and want to catch both additions and removals.

Parameters

Name	Type	Required	Description
Column	Column	✅	The column to validate.
Value set	Array	✅	The exact expected set of values, comma-separated. For example: `low,medium,high`

Example

ticket_id	priority
1	low
2	high
3	medium
4	critical

	Example 1	Example 2
Column	`priority`	`priority`
Value set	`low,medium,high`	`low,medium,high,critical`
Result	❌ Fails	✅ Passes
Reason	`critical` is an extra value not in the expected set	Distinct values in the column match the set exactly

Column most common value must be in set

The most frequent value in the column must be one of the values in the given set.

Use this when you want to catch anomalies in distribution — for example, ensuring the dominant payment method or product category is always one of the expected ones, which could indicate a data pipeline issue if it shifts unexpectedly.

Parameters

Name	Type	Required	Description
Column	Column	✅	The column to validate.
Value set	Array	✅	Acceptable values for the most common, comma-separated. For example: `credit_card,bank_transfer`

Example

payment_id	method
1	credit_card
2	credit_card
3	bank_transfer
4	cash
5	cash
6	cash

	Example 1	Example 2
Column	`method`	`method`
Value set	`credit_card,bank_transfer`	`credit_card,bank_transfer,cash`
Result	❌ Fails	✅ Passes
Reason	The most common value is `cash` (3 occurrences), which is not in the allowed set	The most common value `cash` is in the allowed set

Set of values

Column values must be in set​

Parameters​

Example​

Column distinct values must be in set​

Parameters​

Example​

Column distinct values must contain set​

Parameters​

Example​

Column distinct values must be equal to set​

Parameters​

Example​

Column most common value must be in set​

Parameters​

Example​

Column values must be in set

Parameters

Example

Column distinct values must be in set

Parameters

Example

Column distinct values must contain set

Parameters

Example

Column distinct values must be equal to set

Parameters

Example

Column most common value must be in set

Parameters

Example