Recursive and Polymorphic Serialization
This codebase provides a robust framework for handling complex data structures through recursive and polymorphic serialization. These patterns are implemented using specialized schemas in pydantic_core.core_schema that allow for self-referencing definitions and type-safe discriminated unions.
Recursive Definitions
Recursive structures, such as trees or linked lists, are managed using a combination of DefinitionsSchema and DefinitionReferenceSchema.
The Definitions Container
The DefinitionsSchema acts as a container that holds a primary schema and a list of shared definitions. Each definition in the list is assigned a unique ref string, which can then be referenced elsewhere in the schema.
from pydantic_core import core_schema
# A recursive 'Branch' structure
schema = core_schema.definitions_schema(
# The entry point of the schema
core_schema.definition_reference_schema('Branch'),
[
# The actual definition of 'Branch'
core_schema.typed_dict_schema(
{
'name': core_schema.typed_dict_field(core_schema.str_schema()),
'sub_branch': core_schema.typed_dict_field(
core_schema.nullable_schema(
core_schema.definition_reference_schema('Branch')
)
),
},
ref='Branch',
)
],
)
In this example, found in pydantic-core/tests/validators/test_definitions_recursive.py, the definition_reference_schema points back to the Branch definition, allowing the structure to nest infinitely.
Reference Resolution
The DefinitionReferenceSchema uses the schema_ref field to look up the corresponding definition by its ref. This mechanism ensures that the serialization logic can resolve the correct schema even when the structure is self-referential.
Polymorphic Serialization
Polymorphism allows a single field to hold different types of data. This codebase implements this primarily through TaggedUnionSchema and UnionSchema.
Discriminated Unions (Tagged Unions)
TaggedUnionSchema is the preferred method for high-performance polymorphic serialization. It uses a "discriminator" field to determine which schema choice to apply.
The discriminator can be:
- A simple string (field name).
- A list of strings/ints (a path to a nested field).
- A callable function that returns the discriminator value.
# Example of a tagged union using a field name as a discriminator
schema = core_schema.tagged_union_schema(
choices={
'apple': apple_schema,
'banana': banana_schema,
},
discriminator='type',
)
When serializing, the system looks at the value of the type field to decide whether to use the apple_schema or banana_schema.
Non-Discriminated Unions
For cases where no clear discriminator exists, UnionSchema is used. It supports two modes:
smart: Attempts to find the best match by checking multiple choices.left_to_right: Checks choices in the order they are defined and uses the first one that validates.
Model Integration and Configuration
Advanced serialization often involves ModelSchema, which wraps Python classes. To support polymorphism at the model level, the codebase uses the polymorphic_serialization configuration.
Polymorphic Serialization Toggle
The polymorphic_serialization setting (found in CoreConfig and runtime arguments) determines whether models and dataclasses should be serialized based on their actual runtime type rather than their defined schema type.
- Default (
False): Serialization follows the static schema definition. - Enabled (
True): Serialization dynamically adapts to the instance's actual class, which is essential for serializing subclasses correctly in a union.
This can be set globally in the model configuration or passed during the call to serialization methods like model_dump.
Handling Recursion and Cycles
To prevent infinite loops and stack overflows during serialization, the codebase implements strict safety checks.
Circular Reference Detection
If a serialization process encounters the same object ID multiple times in a way that suggests an infinite loop, it raises a ValueError. Common error messages include:
ValueError: Circular reference detected (id repeated): Triggered when an object is seen again in the same branch of the serialization tree.ValueError: Circular reference detected (depth exceeded): Triggered when the nesting level exceeds the internal safety limit.
These protections are implemented in the underlying Rust core (e.g., pydantic-core/src/serializers/extra.rs) and are exposed to Python as standard ValueError exceptions.
Recursion Loops in Validation
During the validation phase (before serialization), if a DefinitionsSchema detects a loop that cannot be resolved, it triggers a recursion_loop error rather than crashing the process. This ensures that even highly complex, recursive inputs are handled gracefully.