Skip to main content

Schema Composition & Logic

Schema composition in pydantic-core provides the logic for branching, sequencing, and mode-dependent validation. Rather than just checking a single type, these schemas allow developers to define how multiple validation paths should interact, how data should be transformed in stages, and how validation should adapt to different input formats or strictness levels.

Unions and Tagged Unions

Unions allow a value to match one of several possible schemas. pydantic-core implements two primary types of unions: the general-purpose union_schema and the optimized tagged_union_schema.

General Unions

The union_schema (defined in pydantic_core/core_schema.py) supports two modes of operation: smart and left_to_right.

  • Smart Mode (Default): This mode attempts to find the "best" match among the choices. For complex types like models or dictionaries, it prefers the match that validates the greatest number of fields. This prevents a simpler schema from "shadowing" a more specific one just because it appeared earlier in the list.
  • Left-to-Right Mode: This mode simply returns the result of the first schema that successfully validates the input.

The following example from pydantic-core/tests/validators/test_union.py demonstrates how smart mode prefers a more complete match:

from pydantic_core import SchemaValidator, core_schema as cs

class ModelA:
pass

class ModelB:
pass

# ModelB has an extra field 'c' compared to ModelA
schema = cs.union_schema(
choices=[
cs.model_schema(
cls=ModelA,
schema=cs.model_fields_schema(
fields={
'a': cs.model_field(schema=cs.int_schema()),
'b': cs.model_field(schema=cs.str_schema()),
}
),
),
cs.model_schema(
cls=ModelB,
schema=cs.model_fields_schema(
fields={
'a': cs.model_field(schema=cs.int_schema()),
'b': cs.model_field(schema=cs.str_schema()),
'c': cs.model_field(schema=cs.float_schema(), default=1.0),
}
),
),
],
mode='smart'
)

v = SchemaValidator(schema)
# Even though ModelA matches, ModelB is preferred because it accounts for field 'c'
m = v.validate_python({'a': 1, 'b': 'hello', 'c': 2.0})
assert isinstance(m, ModelB)

Tagged Unions

For performance-critical code or unions with many choices, tagged_union_schema is significantly more efficient. Instead of trying every choice, it uses a discriminator to look up the correct schema immediately.

The discriminator can be:

  1. A string representing a field name.
  2. A list of strings/integers representing a path to a field (e.g., ['metadata', 'type']).
  3. A callable that takes the input and returns the tag.

This approach not only improves performance but also provides much clearer error messages when a tag is missing or invalid, as seen in pydantic-core/tests/validators/test_tagged_union.py:

apple_schema = cs.typed_dict_schema({'foo': cs.typed_dict_field(cs.str_schema())})
banana_schema = cs.typed_dict_schema({'foo': cs.typed_dict_field(cs.str_schema()), 'spam': cs.typed_dict_field(cs.int_schema())})

schema = cs.tagged_union_schema(
choices={'apple': apple_schema, 'banana': banana_schema},
discriminator='foo',
)
v = SchemaValidator(schema)

# Validates directly against 'apple' schema
assert v.validate_python({'foo': 'apple'}) == {'foo': 'apple'}

Sequential Logic with Chains

The chain_schema allows for multi-step validation where the output of one step becomes the input for the next. This is the standard pattern for combining raw type validation with custom transformations or refinements.

A common use case found in pydantic-core/tests/validators/test_chain.py is validating a string and then converting it to a Decimal:

from decimal import Decimal
from pydantic_core import SchemaValidator, core_schema as cs

validator = SchemaValidator(
cs.chain_schema(
steps=[
cs.str_schema(),
cs.with_info_plain_validator_function(lambda v, info: Decimal(v))
]
)
)

assert validator.validate_python('1.44') == Decimal('1.44')

Internally, pydantic-core optimizes these chains by flattening nested chain_schema definitions into a single sequence of steps, reducing the overhead of recursive calls during validation.

Mode-Based Validation

pydantic-core allows schemas to behave differently based on the validation context, specifically regarding strictness and the source of the data (JSON vs. Python).

Lax vs. Strict Validation

The lax_or_strict_schema enables different logic depending on whether the validator is running in strict mode. This is useful for types that should allow coercion in lax mode but require exact types in strict mode.

As shown in pydantic-core/tests/validators/test_lax_or_strict.py:

v = SchemaValidator(
cs.lax_or_strict_schema(
lax_schema=cs.str_schema(),
strict_schema=cs.int_schema(),
strict=True # Default to strict
)
)

# Uses int_schema by default
assert v.validate_python(123) == 123

# Can override to lax mode at runtime to use str_schema
assert v.validate_python('aaa', strict=False) == 'aaa'

JSON vs. Python Validation

The json_or_python_schema handles the discrepancy between JSON's limited type system and Python's rich object model. It allows defining one path for validate_json (where types like dates or UUIDs are strings) and another for validate_python (where they might already be native objects).

Example from pydantic-core/tests/validators/test_json_or_python.py:

class Foo(str):
pass

s = cs.json_or_python_schema(
json_schema=cs.no_info_after_validator_function(Foo, cs.str_schema()),
python_schema=cs.is_instance_schema(Foo)
)
v = SchemaValidator(s)

# validate_python requires an actual instance of Foo
assert v.validate_python(Foo('abc')) == Foo('abc')

# validate_json accepts a string and converts it to Foo
assert v.validate_json('"abc"') == Foo('abc')

This separation ensures that validation is both performant (by avoiding unnecessary checks in Python mode) and robust (by handling string representations in JSON mode).