Serialization Fundamentals
Serialization in pydantic-core is a schema-driven process where every CoreSchema can optionally define how its data should be converted into a serializable format. This logic is encapsulated in the serialization field of a schema, which accepts a SerSchema definition.
The Serialization Process
When a SchemaSerializer is initialized with a CoreSchema, it builds a tree of serializers. During serialization (via to_python or to_json), the engine traverses this tree. If a schema contains a serialization key, the engine uses that specific logic instead of the default behavior for that type.
AnySchema as a Flexible Base
The AnySchema (created via core_schema.any_schema) is the most versatile schema for serialization. It matches any Python value and is frequently used to attach custom serialization logic to types that don't fit into standard categories.
from pydantic_core import core_schema, SchemaSerializer
from datetime import date
# Using any_schema to force a specific serialization type
s = SchemaSerializer(core_schema.any_schema(serialization={'type': 'date'}))
# In 'python' mode, it returns the object as-is
assert s.to_python(date(2022, 12, 3)) == date(2022, 12, 3)
# In 'json' mode, it uses the 'date' serializer logic
assert s.to_python(date(2022, 12, 3), mode='json') == '2022-12-03'
If the value being serialized does not match the expected type defined in the serialization field, pydantic-core issues a UserWarning and falls back to a default representation rather than failing.
Contextual Serialization with SerializationInfo
For complex serialization logic, pydantic-core provides the SerializationInfo protocol. This object is passed to custom serializer functions (like those defined in PlainSerializerFunctionSerSchema) when the info_arg parameter is set to True.
SerializationInfo provides access to the runtime state of the serialization process:
mode: Indicates if the output is intended for'python'or'json'.context: A dictionary of user-provided data passed to the serializer.include/exclude: Sets of fields to include or exclude during this specific run.- Flags: Boolean flags like
by_alias,exclude_none, andexclude_unsetthat control how fields and values are handled.
Example of using SerializationInfo in a custom serializer (from pydantic-core/tests/serializers/test_functions.py):
def custom_serializer(value, info: core_schema.SerializationInfo):
return f'{value} mode={info.mode} context={info.context}'
s = SchemaSerializer(
core_schema.any_schema(
serialization=core_schema.plain_serializer_function_ser_schema(
custom_serializer,
info_arg=True,
return_schema=core_schema.str_schema()
)
)
)
# The info object reflects the arguments passed to to_python
result = s.to_python("data", context={'user': 'admin'})
assert "mode=python" in result
assert "context={'user': 'admin'}" in result
Basic Serialization Types
SimpleSerSchema
The SimpleSerSchema is used for straightforward type overrides. It is typically generated using the simple_ser_schema helper and specifies a target ExpectedSerializationTypes (such as 'int', 'str', 'dict', or 'datetime').
ToStringSerSchema
The ToStringSerSchema instructs the serializer to use the object's __str__ method. This is particularly useful for complex objects that have a standard string representation, such as URLs or Colors.
In pydantic/networks.py, this is used to ensure Url objects are serialized as strings:
# From pydantic/networks.py
__pydantic_serializer__ = SchemaSerializer(
core_schema.any_schema(serialization=core_schema.to_string_ser_schema())
)
The to_string_ser_schema helper accepts a when_used argument (of type WhenUsed), which defaults to 'json-unless-none'. This means the string conversion only happens when serializing to JSON and only if the value is not None.
Configuration and Behavior
The fundamental serialization behavior can be influenced by global configuration options passed during schema creation:
ser_json_temporal: Controls how temporal types (datetime, date, time, timedelta) are serialized to JSON (defaulting to'iso8601').ser_json_bytes: Controls how bytes are serialized to JSON (defaulting to'utf8').
These settings interact with the SimpleSerSchema and AnySchema to ensure consistent output across the entire schema tree. For instance, if an AnySchema encounters a bytes object and no custom serializer is provided, it will respect the ser_json_bytes configuration.