Skip to main content

Sets and FrozenSets

Validation for set types in pydantic-core is handled by the SetSchema and FrozenSetSchema structures. These schemas ensure that input data is a collection of unique items and optionally validate each item against a sub-schema.

Core Schema Structures

The validation logic is defined using TypedDict structures in pydantic_core/core_schema.py. Both mutable and immutable sets share a similar configuration interface.

SetSchema

The SetSchema is used for standard Python set objects.

class SetSchema(TypedDict, total=False):
type: Required[Literal['set']]
items_schema: CoreSchema
min_length: int
max_length: int
fail_fast: bool
strict: bool
ref: str
metadata: dict[str, Any]
serialization: SerSchema

FrozenSetSchema

The FrozenSetSchema is used for frozenset objects, providing immutability for the resulting collection.

class FrozenSetSchema(TypedDict, total=False):
type: Required[Literal['frozenset']]
items_schema: CoreSchema
min_length: int
max_length: int
fail_fast: bool
strict: bool
ref: str
metadata: dict[str, Any]
serialization: SerSchema

Creating Set Schemas

The pydantic_core.core_schema module provides helper functions set_schema() and frozenset_schema() to construct these definitions.

from pydantic_core import SchemaValidator, core_schema

# A schema for a set of integers with length constraints
schema = core_schema.set_schema(
items_schema=core_schema.int_schema(),
min_length=1,
max_length=5
)

v = SchemaValidator(schema)
# Validates and coerces items
assert v.validate_python({1, '2', 3}) == {1, 2, 3}

Validation Behavior

Lax vs Strict Validation

The behavior of set validation depends significantly on whether strict mode is enabled.

  • Lax Mode (Default): Accepts various iterable types including list, tuple, deque, and generators. These are automatically converted into a set or frozenset.
  • Strict Mode: Only accepts the exact type. For set_schema, the input must be a set. For frozenset_schema, it must be a frozenset.

As seen in pydantic-core/tests/validators/test_set.py:

# In strict mode, a list will fail validation for a set schema
v = SchemaValidator(core_schema.set_schema(strict=True))
# This raises a ValidationError with type 'set_type'
# v.validate_python([1, 2, 3])

Item Validation and Fail Fast

The items_schema defines how each element in the set is validated. If fail_fast is set to True, validation stops immediately upon encountering the first item that does not match the items_schema. Otherwise, all items are validated, and all errors are collected.

Constraints and Gotchas

Hashability Requirements

Because the output of these validators is a Python set or frozenset, all items must be hashable. If an input contains unhashable items (like a dict or a list), validation will fail with a set_item_not_hashable error.

Example from pydantic-core/tests/validators/test_set.py:

def test_list_with_unhashable_items():
v = SchemaValidator(core_schema.set_schema())

# A list of dicts will fail because dicts are not hashable
# Error: {'type': 'set_item_not_hashable', 'loc': (0,), ...}
v.validate_python([{'a': 'b'}])

Length Validation after Deduplication

A critical detail in pydantic-core is that min_length and max_length constraints are evaluated after the set is constructed. This means that duplicate items in the input are collapsed before the length is checked.

v = SchemaValidator(core_schema.set_schema(min_length=3))

# Input has 3 elements, but only 2 are unique
# This will fail because the resulting set {1, 2} has length 2
# v.validate_python([1, 1, 2])

Serialization

Sets and frozensets are serialized as arrays in JSON, as JSON does not have a native set type.

When using SchemaSerializer, sets can be serialized back to Python objects or to JSON strings. In JSON mode, the order of elements is typically based on the iteration order of the set, which is non-deterministic.

Example from pydantic-core/tests/serializers/test_set_frozenset.py:

from pydantic_core import SchemaSerializer, core_schema

v = SchemaSerializer(core_schema.frozenset_schema(core_schema.any_schema()))
fs = frozenset(['a', 'b', 'c'])

# To Python: returns a frozenset
assert v.to_python(fs) == {'a', 'b', 'c'}

# To JSON: returns a JSON array (list)
# assert v.to_python(fs, mode='json') == ['a', 'b', 'c'] (order may vary)