Skip to main content

Structured Data Models

Structured data models in this codebase are defined using core_schema factory functions. These schemas allow the SchemaValidator to validate complex Python structures including TypedDicts, Pydantic-style Models, and standard Python Dataclasses.

Core Concepts

The pydantic_core.core_schema module provides the bridge between Python's high-level types and the underlying Rust validation engine. Every structured model follows a similar pattern: a top-level schema (e.g., model_schema) wraps a field-level schema (e.g., model_fields_schema), which in turn contains individual field definitions (e.g., model_field).

TypedDicts

The typed_dict_schema is used to validate plain Python dictionaries against a fixed set of keys. Unlike standard dictionaries, TypedDict schemas support field-level validation, aliases, and strictness.

from pydantic_core import SchemaValidator, core_schema

v = SchemaValidator(
core_schema.typed_dict_schema(
fields={
'field_a': core_schema.typed_dict_field(schema=core_schema.str_schema()),
'field_b': core_schema.typed_dict_field(
schema=core_schema.with_default_schema(schema=core_schema.int_schema(), default=666)
),
}
)
)

# Validates and applies defaults
assert v.validate_python({'field_a': b'abc'}) == {'field_a': 'abc', 'field_b': 666}

Key features of typed_dict_schema:

  • Total vs Partial: The total parameter (defaulting to True) determines if all fields are required by default.
  • Extra Behavior: Controlled via extra_behavior, which can be 'ignore', 'allow', or 'forbid'.
  • Aliases: Fields can define a validation_alias for mapping input keys to internal field names.

Models

The model_schema is designed for Pydantic-style classes. It handles the instantiation of the class and populates specific internal attributes used for tracking state.

To work correctly with model_schema, a class typically defines __slots__ for:

  • __dict__: For standard attribute storage.
  • __pydantic_fields_set__: A set of field names that were explicitly provided during validation.
  • __pydantic_extra__: A dictionary for extra fields if extra_behavior='allow'.
  • __pydantic_private__: For private attributes.
class MyModel:
__slots__ = '__dict__', '__pydantic_fields_set__', '__pydantic_extra__', '__pydantic_private__'

v = SchemaValidator(
core_schema.model_schema(
MyModel,
core_schema.model_fields_schema(
{
'field_a': core_schema.model_field(core_schema.str_schema()),
'field_b': core_schema.model_field(core_schema.int_schema()),
}
),
)
)

m = v.validate_python({'field_a': 'test', 'field_b': 12})
assert isinstance(m, MyModel)
assert m.field_a == 'test'
assert m.__pydantic_fields_set__ == {'field_a', 'field_b'}

Revalidation Behavior

The revalidate_instances parameter in model_schema controls how existing instances of the class are handled when passed to the validator:

  • 'never': The instance is returned as-is without validation.
  • 'always': The instance is always re-validated.
  • 'subclass-instances': Only instances of subclasses are re-validated.

Dataclasses

Dataclasses use dataclass_schema in conjunction with dataclass_args_schema. This structure is unique because it must handle both positional and keyword arguments, as well as InitVar (init-only) fields.

import dataclasses
from pydantic_core import SchemaValidator, core_schema

@dataclasses.dataclass
class FooDataclass:
a: str
b: bool

schema = core_schema.dataclass_schema(
FooDataclass,
core_schema.dataclass_args_schema(
'FooDataclass',
[
core_schema.dataclass_field(name='a', schema=core_schema.str_schema()),
core_schema.dataclass_field(name='b', schema=core_schema.bool_schema()),
],
),
['a', 'b'], # List of field names
)

v = SchemaValidator(schema)
foo = v.validate_python({'a': 'hello', 'b': True})
assert isinstance(foo, FooDataclass)

Init-Only Fields

Fields that are only used during initialization (like dataclasses.InitVar) are handled by setting init_only=True in dataclass_field and collect_init_only=True in dataclass_args_schema. These fields are passed to the class constructor but are not stored on the resulting instance.

Advanced Configuration

Extra Fields

Both Models and TypedDicts support handling extra fields via extra_behavior. If set to 'allow', an extras_schema can be provided to validate the values of these extra fields.

# Example of allowing extra fields in a Model
core_schema.model_fields_schema(
fields={'known_field': core_schema.model_field(core_schema.str_schema())},
extra_behavior='allow'
)

Validation Aliases

Fields can be populated from different input keys using validation_alias. This supports simple strings, lists of strings (for path-based lookup), or even multiple alternative paths.

# Field 'a' can be populated from input key 'Apple'
core_schema.typed_dict_field(
schema=core_schema.str_schema(),
validation_alias='Apple'
)

# Field 'b' can be populated from a nested path ['Banana', 1]
core_schema.typed_dict_field(
schema=core_schema.int_schema(),
validation_alias=['Banana', 1]
)

Post-Init Processing

Both model_schema and dataclass_schema support a post_init parameter. For dataclasses, setting post_init=True will trigger the standard __post_init__ method. For models, you can specify the name of a method (e.g., post_init='model_post_init') to be called after the instance is created and populated.