Serialization System
The serialization system in this codebase provides a high-performance mechanism for converting Python objects into JSON-compatible formats or other Python representations. It is built on top of pydantic-core, utilizing a SchemaSerializer to handle complex data structures efficiently.
Core Serialization Methods
The primary entry point for serialization is the BaseModel class, which provides two main methods: model_dump and model_dump_json.
Model Dump
The model_dump method, defined in pydantic/main.py, generates a dictionary representation of the model. It supports two primary modes:
pythonmode: The output may contain non-JSON-serializable Python objects (e.g.,datetimeobjects, sets).jsonmode: The output contains only JSON-compatible types (e.g., strings for dates, lists for sets).
from pydantic import BaseModel
class User(BaseModel):
name: str
id: int
user = User(name='John', id=123)
# Returns {'name': 'John', 'id': 123}
print(user.model_dump())
Model Dump JSON
The model_dump_json method (also in pydantic/main.py) generates a JSON string representation. It is more efficient than calling json.dumps(model.model_dump()) because it performs serialization directly to JSON via pydantic-core.
# Returns '{"name":"John","id":123}'
print(user.model_dump_json())
Common Arguments
Both methods support several arguments to control the output:
include/exclude: Specify which fields to include or exclude.by_alias: Whether to use field aliases as keys.exclude_unset: Exclude fields that were not explicitly set during instantiation.exclude_defaults: Exclude fields that are set to their default values.exclude_none: Exclude fields with a value ofNone.
Serialization for Non-Model Types
For types that do not inherit from BaseModel, such as standard library dataclasses, typed dicts, or primitive types, the TypeAdapter class in pydantic/type_adapter.py provides equivalent functionality.
from pydantic import TypeAdapter
from dataclasses import dataclass
@dataclass
class Item:
name: str
value: int
adapter = TypeAdapter(list[Item])
items = [Item(name='apple', value=1), Item(name='orange', value=2)]
# Equivalent to model_dump
python_data = adapter.dump_python(items)
# Equivalent to model_dump_json
json_data = adapter.dump_json(items)
Customizing Serialization
Pydantic provides several ways to customize how data is serialized, ranging from individual fields to the entire model.
Field Serializers
The @field_serializer decorator in pydantic/functional_serializers.py allows for custom logic on specific fields. It supports two modes:
plain: The function replaces the default serialization logic.wrap: The function receives a handler to optionally call the default logic.
from pydantic import BaseModel, field_serializer
class StudentModel(BaseModel):
name: str = 'Jane'
courses: set[str]
@field_serializer('courses', when_used='json')
def serialize_courses_in_order(self, courses: set[str]):
return sorted(courses)
Model Serializers
The @model_serializer decorator allows you to transform the representation of the entire model. This is useful when the serialized form needs to differ significantly from the model's internal structure.
from typing import Literal
from pydantic import BaseModel, model_serializer
class TemperatureModel(BaseModel):
unit: Literal['C', 'F']
value: int
@model_serializer()
def serialize_model(self):
if self.unit == 'F':
# Convert Fahrenheit to Celsius during serialization
return {'unit': 'C', 'value': int((self.value - 32) / 1.8)}
return {'unit': self.unit, 'value': self.value}
Reusable Serializers with Annotated
For reusable serialization logic across different models, you can use PlainSerializer and WrapSerializer with Annotated.
from typing import Annotated
from pydantic import BaseModel, PlainSerializer
# Define a reusable type that serializes a list into a space-separated string
CustomStr = Annotated[
list[str],
PlainSerializer(lambda x: ' '.join(x), return_type=str)
]
class Document(BaseModel):
tags: CustomStr
doc = Document(tags=['pydantic', 'serialization', 'guide'])
# {'tags': 'pydantic serialization guide'}
print(doc.model_dump())
Advanced Features
Duck-Typing Serialization
By default, Pydantic serializes objects based on their annotated type. If a field is annotated as a base class but contains a subclass instance, only the base class fields are serialized. SerializeAsAny, found in pydantic/functional_serializers.py, forces Pydantic to serialize the actual runtime type.
from pydantic import BaseModel, SerializeAsAny
class Base(BaseModel):
x: int
class Sub(Base):
y: int
class Container(BaseModel):
# Without SerializeAsAny, 'y' would be lost during serialization
item: SerializeAsAny[Base]
container = Container(item=Sub(x=1, y=2))
# {'item': {'x': 1, 'y': 2}}
print(container.model_dump())
Global Configuration
Serialization behavior can be tuned globally for a model using ConfigDict in pydantic/config.py. Key settings include:
ser_json_temporal: Controls the format ofdatetime,date,time, andtimedelta. Options:'iso8601','seconds','milliseconds'.ser_json_bytes: Controls howbytesare encoded in JSON. Options:'utf8','base64','hex'.ser_json_inf_nan: Controls how infinity and NaN values are handled. Options:'null','constants','strings'.
from pydantic import BaseModel, ConfigDict
from datetime import datetime
class Log(BaseModel):
model_config = ConfigDict(ser_json_temporal='seconds')
timestamp: datetime
log = Log(timestamp=datetime(2024, 1, 1))
# {"timestamp": 1704067200.0}
print(log.model_dump_json())