Filtering Collections during Serialization
Filtering collections during serialization allows for precise control over which elements of a list, tuple, or dictionary are included in the final output. In this codebase, this is achieved through specialized serialization schemas that can be embedded within core collection schemas.
Core Filtering Schemas
The filtering logic is encapsulated in two primary TypedDict structures defined in pydantic-core/python/pydantic_core/core_schema.py:
IncExSeqSerSchema: Used for sequence-like collections (lists, tuples, sets). It operates on integer indices.IncExDictSerSchema: Used for dictionary-like collections. It operates on keys, which can be strings or integers.
These schemas are typically constructed using helper functions: filter_seq_schema and filter_dict_schema.
Sequence Filtering Logic
For sequences, filtering is based on the position of the element. The IncExSeqSerSchema defines include and exclude sets of integers.
# From pydantic-core/python/pydantic_core/core_schema.py
class IncExSeqSerSchema(TypedDict, total=False):
type: Required[Literal['include-exclude-sequence']]
include: set[int]
exclude: set[int]
When both include and exclude are provided at the schema level, the exclude set takes precedence. If an index is present in both sets, it will be excluded.
Dictionary Filtering Logic
For dictionaries, filtering is based on keys. The IncExDictSerSchema uses the IncExDict type alias, which is a set[int | str].
# From pydantic-core/python/pydantic_core/core_schema.py
IncExDict: TypeAlias = set[int | str]
class IncExDictSerSchema(TypedDict, total=False):
type: Required[Literal['include-exclude-dict']]
include: IncExDict
exclude: IncExDict
Integration with Collection Schemas
Filtering is not a standalone process but is integrated into the serialization field of collection schemas like ListSchema or DictSchema.
For example, a list schema with filtering is defined as follows:
from pydantic_core import core_schema, SchemaSerializer
v = SchemaSerializer(
core_schema.list_schema(
core_schema.any_schema(),
serialization=core_schema.filter_seq_schema(include={1, 3, 5}, exclude={5, 6})
)
)
# Only indices 1 and 3 are serialized; 5 is excluded despite being in 'include'
assert v.to_python([0, 1, 2, 3, 4, 5, 6, 7]) == [1, 3]
Similarly, for dictionaries:
s = SchemaSerializer(
core_schema.dict_schema(
serialization=core_schema.filter_dict_schema(include={'a', 'c'})
)
)
assert s.to_python({'a': 1, 'b': 2, 'c': 3, 'd': 4}) == {'a': 1, 'c': 3}
Interaction Between Schema and Runtime Filters
One of the most critical design aspects is how schema-defined filters interact with runtime filters passed to to_python or to_json.
Union of Like Filters
If both the schema and the runtime call provide the same type of filter (e.g., both provide include), they are combined using a union.
# Example based on pydantic-core/tests/serializers/test_dict.py
s = SchemaSerializer(
core_schema.dict_schema(
serialization=core_schema.filter_dict_schema(include={'a', 'c'})
)
)
# Schema include {'a', 'c'} UNION runtime include {'d'} = {'a', 'c', 'd'}
assert s.to_python({'a': 1, 'b': 2, 'c': 3, 'd': 4}, include={'d'}) == {'a': 1, 'c': 3, 'd': 4}
Runtime Precedence
A significant design choice in this implementation is that runtime include arguments override schema-level exclude settings. This allows developers to "force" the inclusion of a field at runtime even if it was marked for exclusion in the core schema.
# Example from pydantic-core/tests/serializers/test_list_tuple.py
v = SchemaSerializer(
core_schema.list_schema(
core_schema.any_schema(),
serialization=core_schema.filter_seq_schema(exclude={0, 1})
)
)
# Normally [0, 1] would be excluded, but runtime `include` trumps it
assert v.to_python([0, 1, 2, 3], include={1, 2}) == [1, 2]
Design Tradeoffs and Constraints
Performance vs. Flexibility
By embedding filtering logic directly into the SchemaSerializer, the project avoids the overhead of a post-serialization transformation step. The filtering happens "on the fly" as the collection is being iterated. However, this increases the complexity of the serialization engine, as it must track indices and keys against multiple sets of filters (both schema-level and runtime-level).
Type Constraints
The implementation enforces strict type constraints on filters:
- Sequences: Must use
set[int]. This is highly efficient for index-based lookups but requires the caller to know the exact positions of items. - Dictionaries: Must use
set[int | str]. This aligns with standard JSON and Python dictionary key types.
Complexity of Precedence
The decision to have runtime include override schema exclude provides maximum flexibility for the end-user but can lead to surprising results if the schema designer intended an exclusion to be absolute (e.g., for security or privacy reasons). In this architecture, the runtime caller is treated as the ultimate authority on what should be visible in the output.