Skip to main content

Overview of the Serialization System

The serialization system in this codebase is schema-driven, meaning the same CoreSchema used to validate data also defines how that data is transformed back into Python objects or JSON strings. This transformation is handled by the SchemaSerializer, which traverses the data according to the schema and applies any custom serialization logic defined within it.

The Role of CoreSchema in Serialization

Most schema types in pydantic_core.core_schema include a serialization field. This field accepts a SerSchema, which allows developers to override the default behavior for a specific part of the data tree.

For example, the AnySchema is a catch-all validator that can be paired with a specific serialization strategy:

from pydantic_core import core_schema

# A schema that accepts anything but serializes it using a specific logic
schema = core_schema.any_schema(
serialization=core_schema.simple_ser_schema('str')
)

In this case, while the validator accepts any input, the serializer will attempt to treat the value as a string during the serialization phase.

Simple Serialization Overrides

The SimpleSerSchema is a basic configuration used to pivot the serialization logic to a different primitive type. It is often used via the simple_ser_schema() helper function.

class SimpleSerSchema(TypedDict, total=False):
type: Required[ExpectedSerializationTypes]

This is particularly useful when a complex object should be represented as a simple type (like an integer or string) in the final output without requiring a full custom function.

Functional Serializers and Runtime Metadata

For more complex scenarios, the system supports functional serializers: Plain Serializers and Wrap Serializers. These functions often require access to the current state of the serialization process, which is provided via the SerializationInfo object.

SerializationInfo

SerializationInfo is a protocol that provides metadata about the ongoing serialization call. It is passed to custom serializer functions when info_arg=True is configured in the SerSchema.

Key properties of SerializationInfo include:

  • mode: Indicates if the output is 'python' (e.g., for to_python()) or 'json' (e.g., for to_json()).
  • include / exclude: The sets of fields to include or exclude, typically passed from the top-level call.
  • context: A user-defined dictionary for passing arbitrary data through the serialization process.
  • by_alias: Whether to use field aliases instead of internal names.
  • exclude_none: Whether to omit None values from the output.

Plain Serializers

A plain serializer completely replaces the default serialization logic for a type. In pydantic-core/tests/serializers/test_functions.py, a plain serializer is used to append metadata to a string:

def append_args(value, info: core_schema.SerializationInfo):
return f'{value} info={info}'

s = SchemaSerializer(
core_schema.any_schema(
serialization=core_schema.plain_serializer_function_ser_schema(
append_args, info_arg=True, return_schema=core_schema.str_schema()
)
)
)

Wrap Serializers

A wrap serializer allows you to run logic before or after the default serialization, or even skip the default logic entirely. It receives a serializer callable that can be used to invoke the standard serialization for that schema.

This is useful for handling types like deque, where you might want to convert the collection to a list for JSON but keep it as a deque for Python serialization:

from collections import deque

def serialize_deque(value, serializer, info: core_schema.SerializationInfo):
# Use the provided 'serializer' to process items
items = [serializer(item, index) for index, item in enumerate(value)]
return items if info.mode_is_json() else deque(items)

s = SchemaSerializer(
core_schema.any_schema(
serialization=core_schema.wrap_serializer_function_ser_schema(
serialize_deque, info_arg=True, schema=core_schema.any_schema()
)
)
)

Global Serialization Configuration

The behavior of the serialization system can be tuned globally using CoreConfig. These settings affect how specific types are handled across the entire schema:

  • Temporal Types: ser_json_timedelta and ser_json_temporal (covering datetime, date, time) default to 'iso8601'.
  • Bytes: ser_json_bytes defaults to 'utf8'.
  • Special Floats: ser_json_inf_nan determines how infinity and NaN are handled in JSON (defaulting to 'null').

These configurations ensure consistency across the serialization of standard Python types into JSON-compatible formats.