Recursive Schemas and References
Recursive schemas in this codebase are implemented using a container-reference pattern. This approach allows for the definition of complex, self-referencing, or mutually recursive data structures (like trees or linked lists) while maintaining a flat and manageable schema definition.
The core mechanism involves two primary schema types: DefinitionsSchema, which acts as a scope for shared definitions, and DefinitionReferenceSchema, which points to those definitions.
The Container-Reference Pattern
To create a recursive structure, you must wrap your primary schema in a DefinitionsSchema. This container holds:
- The Entry Point: The main schema that
SchemaValidatororSchemaSerializerwill use. - The Definitions: A list of schemas, each assigned a unique
refstring.
Within these definitions (or the entry point), you use DefinitionReferenceSchema to refer back to any schema in the list by its ref.
Core Schema Types
The implementation relies on the following structures in pydantic_core.core_schema:
DefinitionsSchema: ATypedDictthat defines the scope.class DefinitionsSchema(TypedDict, total=False):
type: Required[Literal['definitions']]
schema: Required[CoreSchema]
definitions: Required[list[CoreSchema]]
# ... other fieldsDefinitionReferenceSchema: ATypedDictthat acts as a pointer.class DefinitionReferenceSchema(TypedDict, total=False):
type: Required[Literal['definition-ref']]
schema_ref: Required[str]
# ... other fields
Implementing Self-Referencing Structures
A common use case is a tree-like structure where a node contains a list of nodes of the same type. In pydantic-core/tests/validators/test_definitions_recursive.py, this is implemented by defining a TypedDict with a ref and then referencing that ref within its own fields.
from pydantic_core import SchemaValidator, core_schema
v = SchemaValidator(
core_schema.definitions_schema(
# The entry point: start by referencing 'Branch'
core_schema.definition_reference_schema('Branch'),
[
# The definition of 'Branch'
core_schema.typed_dict_schema(
{
'width': core_schema.typed_dict_field(core_schema.int_schema()),
'sub_branch': core_schema.typed_dict_field(
core_schema.with_default_schema(
core_schema.union_schema(
[
core_schema.none_schema(),
# Recursive reference back to 'Branch'
core_schema.definition_reference_schema('Branch')
]
),
default=None,
)
),
},
ref='Branch', # Assigning the ref identifier
)
],
)
)
# Validation handles the nested structure
result = v.validate_python({'width': 123, 'sub_branch': {'width': 321}})
assert result == {'width': 123, 'sub_branch': {'width': 321, 'sub_branch': None}}
Mutually Recursive Definitions
The definitions list in DefinitionsSchema can contain multiple schemas that reference each other. This is useful for intertwined data models, such as a Foo that contains a Bar, and a Bar that contains a Foo.
As seen in pydantic-core/tests/validators/test_definitions_recursive.py:
core_schema.definitions_schema(
core_schema.definition_reference_schema('Foo'),
[
core_schema.typed_dict_schema(
{
'height': core_schema.typed_dict_field(core_schema.int_schema()),
'bar': core_schema.typed_dict_field(core_schema.definition_reference_schema('Bar')),
},
ref='Foo',
),
core_schema.typed_dict_schema(
{
'width': core_schema.typed_dict_field(core_schema.int_schema()),
'foo': core_schema.typed_dict_field(
core_schema.nullable_schema(core_schema.definition_reference_schema('Foo'))
),
},
ref='Bar',
),
],
)
Recursive Serialization
The SchemaSerializer also respects these definitions. You can even apply specific serialization logic to a reference. For example, you might want to serialize a nested recursive object as a string instead of a dictionary.
In pydantic-core/tests/serializers/test_definitions_recursive.py, custom serialization is applied to a recursive reference:
from pydantic_core import SchemaSerializer, core_schema
s = SchemaSerializer(
core_schema.definitions_schema(
core_schema.definition_reference_schema('Branch'),
[
core_schema.typed_dict_schema(
{
'name': core_schema.typed_dict_field(core_schema.str_schema()),
'sub_branch': core_schema.typed_dict_field(
core_schema.nullable_schema(
core_schema.definition_reference_schema(
'Branch',
# Custom serialization for the recursive step
serialization=core_schema.to_string_ser_schema(when_used='always')
)
)
),
},
ref='Branch',
)
],
)
)
data = {'name': 'root', 'sub_branch': {'name': 'branch', 'sub_branch': None}}
# The sub_branch is serialized to a string because of the custom serialization on the ref
assert s.to_python(data) == {
'name': 'root',
'sub_branch': "{'name': 'branch', 'sub_branch': None}",
}
Safety and Constraints
Cyclic Data and recursion_loop
While schemas can be recursive, the input data itself cannot contain infinite cycles (e.g., a list that contains itself). If the validator or serializer detects that it is processing the same object instance multiple times in a single branch of the recursion, it will raise a ValidationError with the type recursion_loop.
Missing References and SchemaError
The codebase enforces strict resolution of references. If a DefinitionReferenceSchema points to a schema_ref that is not defined within the current DefinitionsSchema scope, a SchemaError is raised during the initialization of SchemaValidator or SchemaSerializer.
Example of a failing schema from pydantic-core/tests/validators/test_definitions_recursive.py:
# This will raise SchemaError: definition `Branch` was never filled
SchemaValidator(
schema=core_schema.list_schema(
core_schema.definition_reference_schema('Branch')
)
)
To fix this, the list_schema must be wrapped in a DefinitionsSchema that provides the Branch definition.