Skip to main content

Architecture Overview

This section contains architecture diagrams and documentation for pydantic.

Available Diagrams

Pydantic Core System Context Diagram

The Pydantic Core system context diagram illustrates the ecosystem surrounding the core validation engine.

At the center is the Pydantic Core Library, a high-performance validation and serialization engine implemented in Rust. It is primarily consumed by the Pydantic Library, which provides a more user-friendly Pythonic API for developers.

Python Developers interact with the system by writing Python Applications that leverage Pydantic for data integrity. The library is distributed via PyPI and runs within the Python Runtime, where it executes as a native extension.

The diagram also highlights the build-time dependency on the Rust Toolchain (Maturin/Cargo), which is essential for compiling the Rust source into Python-compatible binaries. Additionally, it shows the library's versatility through its support for WebAssembly (Pyodide), allowing it to run in web browsers, and its integration with Logfire for observability.

Key design decisions discovered in the code:

  • Use of maturin for bridging Rust and Python.
  • Minimal runtime dependencies (only typing-extensions).
  • Explicit support for WASM/Pyodide as seen in the wasm-preview directory.
  • Deep integration with the high-level pydantic package as its primary validation backend.

Key Architectural Findings:

  • Pydantic Core is a Rust-based engine (using PyO3) that provides the underlying validation logic for the Pydantic ecosystem.
  • It has minimal Python dependencies, relying primarily on typing-extensions at runtime.
  • The build process is managed by maturin, which compiles Rust code into native Python extensions (.so/.pyd files).
  • The library supports WebAssembly via Pyodide, enabling high-performance validation in browser environments.
  • It serves as the foundation for the Pydantic library, which is the main entry point for most Python applications.

Pydantic Core Internal Architecture

The architecture of pydantic-core is centered around a high-performance Rust core that provides validation and serialization logic for Python.

The system is divided into two main layers:

  1. Python Interface: This layer defines the pydantic_core.core_schema, which is a structured dictionary (TypedDict) that describes how data should be validated and serialized. The core_schema module provides helper functions to build these schemas.
  2. Rust Core: The _pydantic_core Rust extension implements the heavy lifting. It takes a CoreSchema from Python and compiles it into a tree of Rust-based validators or serializers.

Key components include:

  • SchemaValidator: The primary entry point for validation. It takes a schema and provides methods like validate_python and validate_json. Internally, it uses a CombinedValidator to dispatch to specific type validators (e.g., StrValidator, IntValidator).
  • SchemaSerializer: The primary entry point for serialization. It converts Python objects into JSON or other Python-compatible formats based on the schema.
  • Input Trait: A crucial abstraction in the Rust core that allows the same validation logic to be applied to different input formats (Python objects, JSON bytes, etc.) without unnecessary conversion.
  • CoreSchema: The "contract" between Python and Rust, defining the structure and constraints of the data.

Key Architectural Findings:

  • The pydantic_core Python package acts as a thin wrapper around the _pydantic_core Rust extension.
  • core_schema.py defines the CoreSchema type, which is the primary configuration format passed from Python to Rust.
  • SchemaValidator and SchemaSerializer are implemented in Rust and exposed to Python via PyO3.
  • The Rust core uses an Input trait to abstract over different data sources (Python objects, JSON), enabling high-performance validation without intermediate Python object creation for JSON.
  • Validation and serialization logic is modularized into specific Rust modules (e.g., src/validators/string.rs, src/serializers/type_serializers/dict.rs).

Core Schema Data Model

The Core Schema Data Model diagram illustrates the structure of pydantic-core's schema definitions, which are used to configure both validation and serialization.

At the center is the pydantic_core.core_schema, which is a union of various schema types. The diagram highlights several key schema types:

  • TypedDictSchema: Represents a dictionary with a fixed set of keys. It contains a mapping of field names to TypedDictField objects.
  • TypedDictField: Defines the schema for an individual field within a TypedDict, including its own pydantic_core.core_schema, whether it's required, and its aliases.
  • ListSchema: Represents a sequence of items, all following a specific items_schema.
  • StringSchema: Represents string data with various constraints like length and regex patterns.

The diagram also shows how validation and serialization are integrated:

  • Validator: Represented by various function-based schemas (e.g., BeforeValidatorFunctionSchema), which wrap a pydantic_core.core_schema and apply a custom validation function.
  • Serializer: Represented by Specialized Serialization Schemas, which can be attached to almost any CoreSchema to customize how that data is serialized.
  • CoreConfig: Provides global configuration settings (like strict mode or extra_fields_behavior) that apply to the entire schema or specific sub-schemas.

Finally, the SchemaValidator and SchemaSerializer are the primary entry points of the library. They consume a pydantic_core.core_schema and a CoreConfig to perform their respective operations on input data.

Key Architectural Findings:

  • CoreSchema is a comprehensive TypeAlias union of over 40 different schema types, each represented as a TypedDict.
  • TypedDictSchema and ListSchema are recursive, containing other CoreSchema instances via fields or items_schema.
  • Validation logic is often 'wrapped' around a schema using ValidatorFunctionSchema types (Before, After, Wrap, Plain).
  • Serialization behavior is customizable per-schema through an optional 'serialization' field of type SerSchema.
  • SchemaValidator and SchemaSerializer are the core Rust-backed classes that interpret these schema definitions.