Skip to main content

Standard Library Dataclasses

Pydantic Core provides a specialized set of schemas to handle Python's standard library dataclasses. Unlike ModelSchema, which is optimized for Pydantic's own BaseModel, these schemas are designed to respect the specific semantics of native dataclasses, including support for __post_init__, InitVar fields, and specific instantiation patterns.

The implementation follows a three-layered architecture defined in pydantic_core.core_schema:

  1. DataclassSchema: The top-level container that represents the dataclass itself.
  2. DataclassArgsSchema: The validation layer that processes input data into arguments for the dataclass constructor.
  3. DataclassField: The granular definition of individual fields within the dataclass.

Core Architecture

The separation between the class container (DataclassSchema) and the argument validator (DataclassArgsSchema) allows Pydantic to handle complex initialization logic, such as fields that exist only for validation but are not stored on the final instance.

DataclassSchema

The DataclassSchema is the entry point for validating a dataclass. It holds a reference to the actual Python class (cls) and the schema used to validate its arguments.

from pydantic_core import core_schema

# Simplified structure of DataclassSchema
schema = core_schema.DataclassSchema(
type='dataclass',
cls=MyDataclass,
schema=args_schema, # Usually a DataclassArgsSchema
fields=['field_a', 'field_b'],
post_init=True, # Whether to call __post_init__
)

DataclassArgsSchema

This schema defines how to validate the input (typically a dictionary) into the arguments required by the dataclass's __init__ method. It manages field-level validation, computed fields, and extra data behavior.

DataclassField

Each field in a dataclass is described by a DataclassField. This includes its validation schema, whether it is keyword-only, and if it should be included in the initialization process.

Basic Implementation

The following example demonstrates how these components work together to validate a standard Python dataclass.

import dataclasses
from pydantic_core import SchemaValidator, core_schema

@dataclasses.dataclass
class User:
id: int
name: str

# Define the fields
fields = [
core_schema.dataclass_field(name='id', schema=core_schema.int_schema()),
core_schema.dataclass_field(name='name', schema=core_schema.str_schema()),
]

# Define the argument schema
args_schema = core_schema.dataclass_args_schema(
dataclass_name='User',
fields=fields,
)

# Define the top-level dataclass schema
schema = core_schema.dataclass_schema(
cls=User,
schema=args_schema,
fields=['id', 'name'],
)

v = SchemaValidator(schema)
user = v.validate_python({'id': 1, 'name': 'Alice'})
assert isinstance(user, User)
assert user.name == 'Alice'

Advanced Features

Post-Initialization and Init-Only Fields

Standard library dataclasses often use __post_init__ and dataclasses.InitVar. Pydantic Core supports this via the post_init flag in DataclassSchema and the collect_init_only flag in DataclassArgsSchema.

When collect_init_only is enabled, fields marked as init_only=True are collected into a dictionary and passed as arguments to the __post_init__ method after the main __init__ has run.

@dataclasses.dataclass
class Profile:
username: str
age: dataclasses.InitVar[int]

def __post_init__(self, age: int):
self.is_adult = age >= 18

schema = core_schema.dataclass_schema(
Profile,
core_schema.dataclass_args_schema(
'Profile',
[
core_schema.dataclass_field(name='username', schema=core_schema.str_schema()),
core_schema.dataclass_field(
name='age',
schema=core_schema.int_schema(),
init_only=True
),
],
collect_init_only=True,
),
fields=['username'],
post_init=True,
)

Re-validation Behavior

The revalidate_instances property in DataclassSchema controls how the validator behaves when it receives an object that is already an instance of the target dataclass:

  • 'never': (Default) The instance is returned as-is without further validation.
  • 'always': The instance's fields are re-validated against the schema.
  • 'subclass-instances': Only instances of subclasses are re-validated to ensure they conform to the base class schema.

Field-Level Control

DataclassField provides several configuration options that mirror Pydantic's Field behavior but are applied at the core level:

  • Aliases: Use validation_alias to map input keys to field names and serialization_alias for output.
  • Frozen Fields: The frozen flag can be set on individual fields to prevent modification if the dataclass itself is not frozen.
  • Serialization: The serialization_exclude flag determines if a field should be omitted when the dataclass is serialized.

Integration with Pydantic

While these schemas can be used directly with SchemaValidator, they are primarily generated by Pydantic's internal schema generation logic (found in pydantic/_internal/_generate_schema.py). When you use @pydantic.dataclasses.dataclass or pass a standard dataclass to a TypeAdapter, Pydantic automatically constructs this hierarchy of DataclassSchema, DataclassArgsSchema, and DataclassField.