Skip to main content

Set of values

Validations to check that column values belong to a defined domain. They differ in whether they operate row by row or on the set of distinct values, and in the direction of the containment check.

ValidationWhat it checksOperates on
Column values must be in setEvery row has a valid valueEach row individually
Column distinct values must be in setNo unexpected category existsUnique values only
Column distinct values must contain setAll expected categories are presentUnique values only
Column distinct values must be equal to setCategories match exactly — no more, no lessUnique values only
Column most common value must be in setThe most frequent value is one of the expected onesAggregated

Column values must be in set

Every row must contain a value from the given set. Fails if any individual row has a value outside the set.

Use this when you want to enforce a domain at the row level — for example, a status column that should only contain active, inactive, or pending.

Parameters

NameTypeRequiredDescription
ColumnColumnThe column to validate.
Value setArrayAllowed values, comma-separated. For example: active,inactive,pending

Example

order_idstatus
1active
2inactive
3archived
Example 1Example 2
Columnstatusstatus
Value setactive,inactive,pendingactive,inactive,pending,archived
Result❌ Fails✅ Passes
Reasonarchived in row 3 is not in the allowed setAll values in the column are in the allowed set

Column distinct values must be in set

The set of distinct values found in the column must be contained in the given set. Fails if any category appears in the data that is not in the allowed set — regardless of how many rows have that value.

Use this when you want to detect category drift — for example, a new payment method appearing in your data that your pipeline does not know how to handle.

Difference from "Column values must be in set"

Column values must be in set checks every individual row and reports how many rows fail. This validation only checks which unique values exist in the column — it does not report a row count.

Parameters

NameTypeRequiredDescription
ColumnColumnThe column to validate.
Value setArrayAllowed distinct values, comma-separated. For example: credit_card,bank_transfer,cash

Example

payment_idmethod
1credit_card
2bank_transfer
3credit_card
4crypto
Example 1Example 2
Columnmethodmethod
Value setcredit_card,bank_transfer,cashcredit_card,bank_transfer,cash,crypto
Result❌ Fails✅ Passes
Reasoncrypto is not in the allowed setAll distinct values in the column are in the allowed set

Column distinct values must contain set

The given set must be fully contained in the column's distinct values. Fails if any value from the expected set is missing from the column.

Use this when you need to guarantee that all expected categories are present in your data — for example, ensuring a report covers all regions or all product lines.

Difference from "Column distinct values must be in set"

This validation checks the opposite direction: the column must contain the given set, but can have additional values. "Must be in set" checks that the column values are contained within the given set.

Parameters

NameTypeRequiredDescription
ColumnColumnThe column to validate.
Value setArrayExpected values that must be present, comma-separated. For example: north,south,east,west

Example

sale_idregion
1north
2north
3east
Example 1Example 2
Columnregionregion
Value setnorth,south,east,westnorth,east
Result❌ Fails✅ Passes
Reasonsouth and west are missing from the columnAll values in the set are present in the column

Column distinct values must be equal to set

The set of distinct values in the column must exactly match the given set — no extra values, no missing values.

Use this when you need strict control over the full domain of a column — for example, an enum-like field where you know every possible value and want to catch both additions and removals.

Parameters

NameTypeRequiredDescription
ColumnColumnThe column to validate.
Value setArrayThe exact expected set of values, comma-separated. For example: low,medium,high

Example

ticket_idpriority
1low
2high
3medium
4critical
Example 1Example 2
Columnprioritypriority
Value setlow,medium,highlow,medium,high,critical
Result❌ Fails✅ Passes
Reasoncritical is an extra value not in the expected setDistinct values in the column match the set exactly

Column most common value must be in set

The most frequent value in the column must be one of the values in the given set.

Use this when you want to catch anomalies in distribution — for example, ensuring the dominant payment method or product category is always one of the expected ones, which could indicate a data pipeline issue if it shifts unexpectedly.

Parameters

NameTypeRequiredDescription
ColumnColumnThe column to validate.
Value setArrayAcceptable values for the most common, comma-separated. For example: credit_card,bank_transfer

Example

payment_idmethod
1credit_card
2credit_card
3bank_transfer
4cash
5cash
6cash
Example 1Example 2
Columnmethodmethod
Value setcredit_card,bank_transfercredit_card,bank_transfer,cash
Result❌ Fails✅ Passes
ReasonThe most common value is cash (3 occurrences), which is not in the allowed setThe most common value cash is in the allowed set