Architecture Overview
This section contains architecture diagrams and documentation for pydantic.
Available Diagrams
Pydantic Core System Context Diagram
The Pydantic Core system context diagram illustrates the ecosystem surrounding the core validation engine.
At the center is the Pydantic Core Library, a high-performance validation and serialization engine implemented in Rust. It is primarily consumed by the Pydantic Library, which provides a more user-friendly Pythonic API for developers.
Python Developers interact with the system by writing Python Applications that leverage Pydantic for data integrity. The library is distributed via PyPI and runs within the Python Runtime, where it executes as a native extension.
The diagram also highlights the build-time dependency on the Rust Toolchain (Maturin/Cargo), which is essential for compiling the Rust source into Python-compatible binaries. Additionally, it shows the library's versatility through its support for WebAssembly (Pyodide), allowing it to run in web browsers, and its integration with Logfire for observability.
Key design decisions discovered in the code:
- Use of
maturinfor bridging Rust and Python. - Minimal runtime dependencies (only
typing-extensions). - Explicit support for WASM/Pyodide as seen in the
wasm-previewdirectory. - Deep integration with the high-level
pydanticpackage as its primary validation backend.
Key Architectural Findings:
- Pydantic Core is a Rust-based engine (using PyO3) that provides the underlying validation logic for the Pydantic ecosystem.
- It has minimal Python dependencies, relying primarily on
typing-extensionsat runtime. - The build process is managed by
maturin, which compiles Rust code into native Python extensions (.so/.pyd files). - The library supports WebAssembly via Pyodide, enabling high-performance validation in browser environments.
- It serves as the foundation for the Pydantic library, which is the main entry point for most Python applications.
Pydantic Core Internal Architecture
The architecture of pydantic-core is centered around a high-performance Rust core that provides validation and serialization logic for Python.
The system is divided into two main layers:
- Python Interface: This layer defines the pydantic_core.core_schema, which is a structured dictionary (TypedDict) that describes how data should be validated and serialized. The
core_schemamodule provides helper functions to build these schemas. - Rust Core: The
_pydantic_coreRust extension implements the heavy lifting. It takes aCoreSchemafrom Python and compiles it into a tree of Rust-based validators or serializers.
Key components include:
- SchemaValidator: The primary entry point for validation. It takes a schema and provides methods like
validate_pythonandvalidate_json. Internally, it uses aCombinedValidatorto dispatch to specific type validators (e.g.,StrValidator,IntValidator). - SchemaSerializer: The primary entry point for serialization. It converts Python objects into JSON or other Python-compatible formats based on the schema.
- Input Trait: A crucial abstraction in the Rust core that allows the same validation logic to be applied to different input formats (Python objects, JSON bytes, etc.) without unnecessary conversion.
- CoreSchema: The "contract" between Python and Rust, defining the structure and constraints of the data.
Key Architectural Findings:
- The
pydantic_corePython package acts as a thin wrapper around the_pydantic_coreRust extension. core_schema.pydefines theCoreSchematype, which is the primary configuration format passed from Python to Rust.SchemaValidatorandSchemaSerializerare implemented in Rust and exposed to Python via PyO3.- The Rust core uses an
Inputtrait to abstract over different data sources (Python objects, JSON), enabling high-performance validation without intermediate Python object creation for JSON. - Validation and serialization logic is modularized into specific Rust modules (e.g.,
src/validators/string.rs,src/serializers/type_serializers/dict.rs).
Core Schema Data Model
The Core Schema Data Model diagram illustrates the structure of pydantic-core's schema definitions, which are used to configure both validation and serialization.
At the center is the pydantic_core.core_schema, which is a union of various schema types. The diagram highlights several key schema types:
- TypedDictSchema: Represents a dictionary with a fixed set of keys. It contains a mapping of field names to TypedDictField objects.
- TypedDictField: Defines the schema for an individual field within a
TypedDict, including its own pydantic_core.core_schema, whether it's required, and its aliases. - ListSchema: Represents a sequence of items, all following a specific
items_schema. - StringSchema: Represents string data with various constraints like length and regex patterns.
The diagram also shows how validation and serialization are integrated:
- Validator: Represented by various function-based schemas (e.g.,
BeforeValidatorFunctionSchema), which wrap a pydantic_core.core_schema and apply a custom validation function. - Serializer: Represented by Specialized Serialization Schemas, which can be attached to almost any
CoreSchemato customize how that data is serialized. - CoreConfig: Provides global configuration settings (like
strictmode orextra_fields_behavior) that apply to the entire schema or specific sub-schemas.
Finally, the SchemaValidator and SchemaSerializer are the primary entry points of the library. They consume a pydantic_core.core_schema and a CoreConfig to perform their respective operations on input data.
Key Architectural Findings:
- CoreSchema is a comprehensive TypeAlias union of over 40 different schema types, each represented as a TypedDict.
- TypedDictSchema and ListSchema are recursive, containing other CoreSchema instances via fields or items_schema.
- Validation logic is often 'wrapped' around a schema using ValidatorFunctionSchema types (Before, After, Wrap, Plain).
- Serialization behavior is customizable per-schema through an optional 'serialization' field of type SerSchema.
- SchemaValidator and SchemaSerializer are the core Rust-backed classes that interpret these schema definitions.