Collection and Model Serialization Logic
Serialization in pydantic-core is not merely a conversion to JSON; it is a structured transformation process defined within the CoreSchema. While validation ensures data integrity on input, serialization schemas provide fine-grained control over how Python objects are represented on output. This logic is primarily managed through specialized schemas for collections, models, and string transformations.
Collection Filtering
Filtering which elements of a collection are serialized is a common requirement, often handled via include and exclude parameters. In pydantic-core, this logic is encapsulated in IncExDictSerSchema and IncExSeqSerSchema.
Dictionary Filtering
The IncExDictSerSchema (created via filter_dict_schema) allows filtering dictionary entries by their keys. It uses the IncExDict type, which can be a set of strings or integers.
from pydantic_core import SchemaSerializer, core_schema
# Using filter_dict_schema to include specific keys
s = SchemaSerializer(
core_schema.dict_schema(
serialization=core_schema.filter_dict_schema(include={'a', 'c'})
)
)
assert s.to_python({'a': 1, 'b': 2, 'c': 3}) == {'a': 1, 'c': 3}
This design separates the validation of the dictionary (which might allow any keys) from the serialization policy. The IncExDict type alias in pydantic_core/core_schema.py is defined as:
IncExDict: TypeAlias = set[int | str]
Sequence Filtering
For sequences like lists and tuples, filtering is based on integer indices using IncExSeqSerSchema (via filter_seq_schema).
# Using filter_seq_schema to exclude specific indices
s = SchemaSerializer(
core_schema.list_schema(
core_schema.any_schema(),
serialization=core_schema.filter_seq_schema(exclude={1, 3, 5})
)
)
assert s.to_python([0, 1, 2, 3, 4, 5, 6]) == [0, 2, 4, 6]
The distinction between dictionary and sequence filtering is enforced at the schema level: dictionaries use keys (which can be strings), while sequences strictly use integer indices.
Model Serialization Logic
When serializing complex objects like class instances or dataclasses, pydantic-core uses ModelSerSchema. This schema acts as a wrapper that connects a Python class to its internal field serialization logic.
The ModelSerSchema requires two primary components:
cls: The expected Python class. This is used to verify the object type during serialization.schema: The internalCoreSchema(usually aModelFieldsSchema) used to extract and serialize the object's attributes.
# Example of a model serialization schema for a dataclass
s = SchemaSerializer(
core_schema.model_schema(
MyClass,
core_schema.model_fields_schema({...}),
serialization=core_schema.model_ser_schema(
MyClass,
core_schema.model_fields_schema({
'foo': core_schema.model_field(core_schema.int_schema()),
}),
),
)
)
A key design choice in ModelSerSchema is the inclusion of the cls reference. If the object being serialized is not an instance of cls, pydantic-core will issue a UserWarning, though it will still attempt to serialize the object using the provided schema. This provides a balance between performance (not strictly blocking serialization) and developer safety.
String and Format Transformations
For types that do not have a native representation in JSON (like UUID, datetime, or custom objects), pydantic-core provides FormatSerSchema and ToStringSerSchema. These allow values to be transformed into strings during the serialization process.
Custom Formatting
FormatSerSchema leverages Python's format() protocol. It is particularly useful for numeric types where specific precision is required in the output.
# Formatting a float to 4 decimal places in JSON
s = SchemaSerializer(
core_schema.any_schema(
serialization=core_schema.format_ser_schema('0.4f')
)
)
assert s.to_json(42.12345) == b'"42.1234"'
String Conversion
ToStringSerSchema simply calls str() on the value. This is the standard way pydantic-core handles types like UUID or Url objects when serializing to JSON.
Conditional Usage
Both schemas utilize the when_used parameter (of type WhenUsed), which controls the circumstances under which the transformation is applied. The default for these schemas is 'json-unless-none', meaning:
- The transformation applies when serializing to JSON.
- The transformation is skipped if the value is
None.
Other possible values for when_used include 'always' (apply in both Python and JSON serialization) and 'unless-none'.
Design Tradeoffs and Constraints
The implementation of these serialization schemas reflects several core design principles of pydantic-core:
- Separation of Concerns: By defining serialization logic separately from validation logic (within the same
CoreSchema), the library allows for asymmetric data handling. For example, a field might be validated as an integer but serialized as a formatted string. - Performance via Rust Integration: While these schemas are defined as
TypedDictin Python for type hinting and schema construction, they are consumed by the underlying Rust engine. Filtering collections and formatting strings at this level is significantly faster than performing the same operations in pure Python. - Explicit Defaults: Many serialization schemas, such as
FormatSerSchema, have defaults like'json-unless-none'. This is a deliberate choice to ensure thatNonevalues remainnullin JSON rather than being converted to the string"None", which is almost never the desired behavior in web APIs. - Index-Based Sequence Filtering: Filtering sequences by index (
set[int]) is highly efficient but requires the caller to know the exact positions. This is a lower-level approach compared to value-based filtering, prioritizing speed and predictability in the core engine.