Skip to main content

Serialization System

The serialization system in this codebase provides a high-performance mechanism for converting Python objects into JSON-compatible formats or other Python representations. It is built on top of pydantic-core, utilizing a SchemaSerializer to handle complex data structures efficiently.

Core Serialization Methods

The primary entry point for serialization is the BaseModel class, which provides two main methods: model_dump and model_dump_json.

Model Dump

The model_dump method, defined in pydantic/main.py, generates a dictionary representation of the model. It supports two primary modes:

  • python mode: The output may contain non-JSON-serializable Python objects (e.g., datetime objects, sets).
  • json mode: The output contains only JSON-compatible types (e.g., strings for dates, lists for sets).
from pydantic import BaseModel

class User(BaseModel):
name: str
id: int

user = User(name='John', id=123)
# Returns {'name': 'John', 'id': 123}
print(user.model_dump())

Model Dump JSON

The model_dump_json method (also in pydantic/main.py) generates a JSON string representation. It is more efficient than calling json.dumps(model.model_dump()) because it performs serialization directly to JSON via pydantic-core.

# Returns '{"name":"John","id":123}'
print(user.model_dump_json())

Common Arguments

Both methods support several arguments to control the output:

  • include/exclude: Specify which fields to include or exclude.
  • by_alias: Whether to use field aliases as keys.
  • exclude_unset: Exclude fields that were not explicitly set during instantiation.
  • exclude_defaults: Exclude fields that are set to their default values.
  • exclude_none: Exclude fields with a value of None.

Serialization for Non-Model Types

For types that do not inherit from BaseModel, such as standard library dataclasses, typed dicts, or primitive types, the TypeAdapter class in pydantic/type_adapter.py provides equivalent functionality.

from pydantic import TypeAdapter
from dataclasses import dataclass

@dataclass
class Item:
name: str
value: int

adapter = TypeAdapter(list[Item])
items = [Item(name='apple', value=1), Item(name='orange', value=2)]

# Equivalent to model_dump
python_data = adapter.dump_python(items)
# Equivalent to model_dump_json
json_data = adapter.dump_json(items)

Customizing Serialization

Pydantic provides several ways to customize how data is serialized, ranging from individual fields to the entire model.

Field Serializers

The @field_serializer decorator in pydantic/functional_serializers.py allows for custom logic on specific fields. It supports two modes:

  • plain: The function replaces the default serialization logic.
  • wrap: The function receives a handler to optionally call the default logic.
from pydantic import BaseModel, field_serializer

class StudentModel(BaseModel):
name: str = 'Jane'
courses: set[str]

@field_serializer('courses', when_used='json')
def serialize_courses_in_order(self, courses: set[str]):
return sorted(courses)

Model Serializers

The @model_serializer decorator allows you to transform the representation of the entire model. This is useful when the serialized form needs to differ significantly from the model's internal structure.

from typing import Literal
from pydantic import BaseModel, model_serializer

class TemperatureModel(BaseModel):
unit: Literal['C', 'F']
value: int

@model_serializer()
def serialize_model(self):
if self.unit == 'F':
# Convert Fahrenheit to Celsius during serialization
return {'unit': 'C', 'value': int((self.value - 32) / 1.8)}
return {'unit': self.unit, 'value': self.value}

Reusable Serializers with Annotated

For reusable serialization logic across different models, you can use PlainSerializer and WrapSerializer with Annotated.

from typing import Annotated
from pydantic import BaseModel, PlainSerializer

# Define a reusable type that serializes a list into a space-separated string
CustomStr = Annotated[
list[str],
PlainSerializer(lambda x: ' '.join(x), return_type=str)
]

class Document(BaseModel):
tags: CustomStr

doc = Document(tags=['pydantic', 'serialization', 'guide'])
# {'tags': 'pydantic serialization guide'}
print(doc.model_dump())

Advanced Features

Duck-Typing Serialization

By default, Pydantic serializes objects based on their annotated type. If a field is annotated as a base class but contains a subclass instance, only the base class fields are serialized. SerializeAsAny, found in pydantic/functional_serializers.py, forces Pydantic to serialize the actual runtime type.

from pydantic import BaseModel, SerializeAsAny

class Base(BaseModel):
x: int

class Sub(Base):
y: int

class Container(BaseModel):
# Without SerializeAsAny, 'y' would be lost during serialization
item: SerializeAsAny[Base]

container = Container(item=Sub(x=1, y=2))
# {'item': {'x': 1, 'y': 2}}
print(container.model_dump())

Global Configuration

Serialization behavior can be tuned globally for a model using ConfigDict in pydantic/config.py. Key settings include:

  • ser_json_temporal: Controls the format of datetime, date, time, and timedelta. Options: 'iso8601', 'seconds', 'milliseconds'.
  • ser_json_bytes: Controls how bytes are encoded in JSON. Options: 'utf8', 'base64', 'hex'.
  • ser_json_inf_nan: Controls how infinity and NaN values are handled. Options: 'null', 'constants', 'strings'.
from pydantic import BaseModel, ConfigDict
from datetime import datetime

class Log(BaseModel):
model_config = ConfigDict(ser_json_temporal='seconds')
timestamp: datetime

log = Log(timestamp=datetime(2024, 1, 1))
# {"timestamp": 1704067200.0}
print(log.model_dump_json())