Skip to main content

Pydantic Models and Fields

Pydantic models in this codebase are implemented using a hierarchical schema architecture. At the top level, a model is represented by a ModelSchema, which wraps a Python class and an internal schema—typically a ModelFieldsSchema—that defines how the model's fields are validated and serialized.

Model Architecture Overview

The core of a Pydantic model is defined by three primary schema types in pydantic_core.core_schema:

  1. ModelSchema: The entry point for model validation. it binds a specific Python class (cls) to a validation schema.
  2. ModelFieldsSchema: The engine that processes input data into a format suitable for the model. It manages the collection of fields, handles extra attributes, and supports ORM-style loading.
  3. ModelField: Defines the validation and serialization logic for an individual attribute within the model.

This separation allows the codebase to decouple the structure of the Python class from the logic used to populate it.

The Validation Flow

When a SchemaValidator is created with a ModelSchema, the validation process follows a specific sequence:

  1. The input (usually a dict) is passed to the internal schema (usually a ModelFieldsSchema).
  2. The ModelFieldsSchema validates the fields and returns a tuple containing:
    • A dictionary of validated field values.
    • A dictionary of "extra" fields (if configured).
    • A set of field names that were explicitly present in the input (fields_set).
  3. The ModelSchema then takes this data and initializes an instance of the target cls.

Example: Basic Model Definition

In pydantic-core/tests/validators/test_model.py, a standard model is constructed by nesting these schemas:

from pydantic_core import SchemaValidator, core_schema

class MyModel:
# Slots are often used to manage internal Pydantic state
__slots__ = '__dict__', '__pydantic_fields_set__', '__pydantic_extra__', '__pydantic_private__'
field_a: str
field_b: int

v = SchemaValidator(
core_schema.model_schema(
MyModel,
core_schema.model_fields_schema(
{
'field_a': core_schema.model_field(core_schema.str_schema()),
'field_b': core_schema.model_field(core_schema.int_schema()),
}
),
)
)

m = v.validate_python({'field_a': 'test', 'field_b': 12})
assert isinstance(m, MyModel)
assert m.field_a == 'test'

Field Management with ModelFieldsSchema

The ModelFieldsSchema is responsible for the "heavy lifting" of field processing. It provides several critical configurations:

  • extra_behavior: Controls how fields not defined in the fields dictionary are handled. Options include 'allow', 'ignore', and 'forbid'.
  • from_attributes: When enabled, allows the validator to extract data from object attributes rather than just dictionary keys (useful for ORM integration).
  • computed_fields: A list of ComputedField definitions that are evaluated during serialization.

If extra_behavior is set to 'allow', the extra data is stored in the instance's __pydantic_extra__ attribute.

Granular Control with ModelField

Each entry in the ModelFieldsSchema.fields dictionary is a ModelField. This schema allows for fine-grained control over individual attributes:

  • Aliases: Use validation_alias to look up data under a different key during validation, and serialization_alias to rename the field during output.
  • Serialization Control: The serialization_exclude and serialization_exclude_if properties allow for conditional exclusion of fields during serialization.
  • Immutability: Setting frozen: True on a ModelField indicates that the field should not be modified after initialization.

Example of a field with a serialization alias and conditional exclusion:

# From pydantic-core/tests/serializers/test_model.py
core_schema.model_field(
core_schema.int_schema(),
serialization_exclude_if=lambda x: x > 1,
serialization_alias='Meow'
)

Specialized Serialization with ModelSerSchema

While ModelSchema and ModelFieldsSchema handle both validation and serialization by default, ModelSerSchema provides a way to override serialization behavior specifically. This is often used when the validation logic is complex (e.g., using call_schema for a dataclass) but the output needs to follow a standard model structure.

In pydantic-core/tests/serializers/test_model.py, model_ser_schema is used to ensure a dataclass serializes its fields correctly even when validated through a function call:

schema = core_schema.call_schema(
core_schema.arguments_schema([...]),
DataClass,
serialization=core_schema.model_ser_schema(
DataClass,
core_schema.model_fields_schema({
'foo': core_schema.model_field(core_schema.int_schema()),
}),
),
)

Instance Re-validation and Post-Init

The ModelSchema provides hooks for lifecycle management:

  • revalidate_instances: Determines if an existing instance of the model passed to the validator should be re-validated. It can be set to 'always', 'never', or 'subclass-instances'.
  • post_init: Specifies the name of a method (e.g., 'model_post_init') on the class that should be called after the instance has been initialized with validated data.

These features ensure that models remain consistent even when data is loaded from existing objects or requires complex initialization logic.