Sets and FrozenSets
Validation for set types in pydantic-core is handled by the SetSchema and FrozenSetSchema structures. These schemas ensure that input data is a collection of unique items and optionally validate each item against a sub-schema.
Core Schema Structures
The validation logic is defined using TypedDict structures in pydantic_core/core_schema.py. Both mutable and immutable sets share a similar configuration interface.
SetSchema
The SetSchema is used for standard Python set objects.
class SetSchema(TypedDict, total=False):
type: Required[Literal['set']]
items_schema: CoreSchema
min_length: int
max_length: int
fail_fast: bool
strict: bool
ref: str
metadata: dict[str, Any]
serialization: SerSchema
FrozenSetSchema
The FrozenSetSchema is used for frozenset objects, providing immutability for the resulting collection.
class FrozenSetSchema(TypedDict, total=False):
type: Required[Literal['frozenset']]
items_schema: CoreSchema
min_length: int
max_length: int
fail_fast: bool
strict: bool
ref: str
metadata: dict[str, Any]
serialization: SerSchema
Creating Set Schemas
The pydantic_core.core_schema module provides helper functions set_schema() and frozenset_schema() to construct these definitions.
from pydantic_core import SchemaValidator, core_schema
# A schema for a set of integers with length constraints
schema = core_schema.set_schema(
items_schema=core_schema.int_schema(),
min_length=1,
max_length=5
)
v = SchemaValidator(schema)
# Validates and coerces items
assert v.validate_python({1, '2', 3}) == {1, 2, 3}
Validation Behavior
Lax vs Strict Validation
The behavior of set validation depends significantly on whether strict mode is enabled.
- Lax Mode (Default): Accepts various iterable types including
list,tuple,deque, and generators. These are automatically converted into asetorfrozenset. - Strict Mode: Only accepts the exact type. For
set_schema, the input must be aset. Forfrozenset_schema, it must be afrozenset.
As seen in pydantic-core/tests/validators/test_set.py:
# In strict mode, a list will fail validation for a set schema
v = SchemaValidator(core_schema.set_schema(strict=True))
# This raises a ValidationError with type 'set_type'
# v.validate_python([1, 2, 3])
Item Validation and Fail Fast
The items_schema defines how each element in the set is validated. If fail_fast is set to True, validation stops immediately upon encountering the first item that does not match the items_schema. Otherwise, all items are validated, and all errors are collected.
Constraints and Gotchas
Hashability Requirements
Because the output of these validators is a Python set or frozenset, all items must be hashable. If an input contains unhashable items (like a dict or a list), validation will fail with a set_item_not_hashable error.
Example from pydantic-core/tests/validators/test_set.py:
def test_list_with_unhashable_items():
v = SchemaValidator(core_schema.set_schema())
# A list of dicts will fail because dicts are not hashable
# Error: {'type': 'set_item_not_hashable', 'loc': (0,), ...}
v.validate_python([{'a': 'b'}])
Length Validation after Deduplication
A critical detail in pydantic-core is that min_length and max_length constraints are evaluated after the set is constructed. This means that duplicate items in the input are collapsed before the length is checked.
v = SchemaValidator(core_schema.set_schema(min_length=3))
# Input has 3 elements, but only 2 are unique
# This will fail because the resulting set {1, 2} has length 2
# v.validate_python([1, 1, 2])
Serialization
Sets and frozensets are serialized as arrays in JSON, as JSON does not have a native set type.
When using SchemaSerializer, sets can be serialized back to Python objects or to JSON strings. In JSON mode, the order of elements is typically based on the iteration order of the set, which is non-deterministic.
Example from pydantic-core/tests/serializers/test_set_frozenset.py:
from pydantic_core import SchemaSerializer, core_schema
v = SchemaSerializer(core_schema.frozenset_schema(core_schema.any_schema()))
fs = frozenset(['a', 'b', 'c'])
# To Python: returns a frozenset
assert v.to_python(fs) == {'a', 'b', 'c'}
# To JSON: returns a JSON array (list)
# assert v.to_python(fs, mode='json') == ['a', 'b', 'c'] (order may vary)