Skip to main content

Architecture Overview

This section contains architecture diagrams and documentation for pydantic.

Available Diagrams

Pydantic Core Internal Architecture

The pydantic-core library is structured as a high-performance Rust core with a thin Python wrapper.

The architecture is centered around two main entry points: SchemaValidator and SchemaSerializer. These are exposed to Python via the _pydantic_core extension module.

Key components include:

  • Python Layer: The pydantic_core package provides the public API and re-exports the Rust-implemented classes. The core_schema module defines the structure of schemas that the core can process, primarily using Python TypedDicts.
  • Validation Engine: Implemented in Rust, it builds a tree of specialized validators from the input schema. It uses an Input Abstraction to uniformly process different data formats like Python objects and JSON (via the jiter crate).
  • Serialization Engine: Also in Rust, it builds a tree of serializers. It handles converting complex Python objects back into JSON or plain Python types.
  • Shared Infrastructure: Both engines rely on a Definitions system to handle recursive models and shared references, and a robust Error Handling system to produce detailed ValidationErrors.

The diagram illustrates how data flows from the Python user through the validator/serializer entry points into the specialized Rust logic, and how the Rust core abstracts away the differences between input formats.

Key Architectural Findings:

  • pydantic-core uses a Rust extension (_pydantic_core) to provide high-performance validation and serialization.
  • The SchemaValidator and SchemaSerializer are the primary interfaces between Python and Rust.
  • An 'Input' trait in Rust abstracts over Python objects and JSON data, allowing validators to be format-agnostic.
  • The 'core_schema' Python module defines the contract for schemas using TypedDicts, which are then parsed by Rust to build validator/serializer trees.
  • A shared 'Definitions' system manages recursion and cross-references within schemas.

Core Schema and Validator Data Model

The data model of pydantic-core revolves around the translation of a pydantic_core.core_schema (a Python-side definition) into high-performance Rust-based validators and serializers.

Key Components:

  • CoreSchema: A recursive, union-based data structure (implemented as TypedDict in Python) that defines the validation and serialization logic for a specific type. It includes fields like type, ref, and metadata.
  • SchemaValidator: The primary engine for validation. It is initialized from a CoreSchema and a CoreConfig. Internally, it holds a CombinedValidator, which is an enum of specialized validator implementations (e.g., IntValidator, ModelValidator).
  • SchemaSerializer: The engine for converting Python objects into JSON or other Python formats. Like the validator, it is built from a CoreSchema and uses a CombinedSerializer enum.
  • Input: A Rust trait that abstracts over different input formats, primarily PythonInput (wrapping PyAny) and JsonInput (wrapping jiter::JsonValue).
  • ValidationError: An exception raised when validation fails. It encapsulates one or more PyLineError objects, each detailing the ErrorType, the Location (path) of the error, and the input_value that caused it.
  • SchemaError: An exception raised during the construction of a SchemaValidator or SchemaSerializer if the provided CoreSchema is invalid or inconsistent.
  • CoreConfig: A configuration object that provides global settings such as strict mode, extra_fields_behavior, and string constraints.

The diagram illustrates how the CoreSchema acts as the source of truth for both validation and serialization, and how these processes interact with input data and error reporting.

Key Architectural Findings:

  • CoreSchema is a recursive TypedDict union that serves as the blueprint for both validation and serialization.
  • SchemaValidator and SchemaSerializer are the core Rust-implemented classes exposed to Python.
  • The Input trait allows the same validation logic to be applied to both Python objects and raw JSON data.
  • ValidationError is a structured container for PyLineError, which provides detailed context (type, location, input) for each failure.
  • SchemaError handles errors in the schema definition itself, separate from data validation errors.