Skip to main content

Union and Discriminator Logic

In this codebase, handling multiple possible types is achieved through two primary schema structures: UnionSchema and TaggedUnionSchema. While both allow a value to match one of several schemas, they differ significantly in performance characteristics and selection logic.

Standard Unions

The UnionSchema (implemented via core_schema.union_schema) is the most flexible way to handle multiple types. It attempts to validate the input against a list of potential schemas defined in the choices attribute.

Validation Modes

The behavior of a union is governed by its mode, which determines how the validator selects the winning schema:

  • Smart Mode (smart): This is the default. The validator attempts to find the "best" match. It evaluates the choices and selects the one that most closely matches the input. For example, if the input is a dictionary, it prefers a typed-dict schema that matches more fields over one that matches fewer.
  • Left-to-Right Mode (left_to_right): The validator tries each schema in the order they are defined in the choices list. The first schema that successfully validates the input is used, and subsequent choices are ignored.
from pydantic_core import SchemaValidator, core_schema

# A simple union allowing either a string or an integer
schema = core_schema.union_schema([
core_schema.str_schema(),
core_schema.int_schema()
])
v = SchemaValidator(schema)

assert v.validate_python('hello') == 'hello'
assert v.validate_python(1) == 1

Automatic Collapsing

The auto_collapse field (defaulting to True) is a performance optimization. If a union contains only a single choice, pydantic-core can automatically collapse the union and use the inner validator directly, reducing the overhead of the union logic.

Tagged Unions

For high-performance branching, especially when dealing with complex models like TypedDict or Dataclasses, TaggedUnionSchema is preferred. Instead of trying multiple schemas, it uses a "discriminator" to jump directly to the correct schema.

Discriminator Strategies

The discriminator defines how to extract the "tag" from the input data. This codebase supports several sophisticated discriminator types:

  1. Field Name: A simple string representing a key in a dictionary or an attribute on an object.
  2. Path: A list of strings or integers representing a path to a nested value (e.g., ['metadata', 'type']).
  3. Multiple Paths: A list of paths. The validator attempts to extract a tag from each path until one succeeds.
  4. Callable: A custom function that takes the input and returns the tag.
# Example of a Tagged Union using a field name discriminator
apple_schema = core_schema.typed_dict_schema({
'foo': core_schema.typed_dict_field(core_schema.str_schema()),
'bar': core_schema.typed_dict_field(core_schema.int_schema()),
})
banana_schema = core_schema.typed_dict_schema({
'foo': core_schema.typed_dict_field(core_schema.str_schema()),
'spam': core_schema.typed_dict_field(core_schema.list_schema(items_schema=core_schema.int_schema())),
})

schema = core_schema.tagged_union_schema(
choices={
'apple': apple_schema,
'banana': banana_schema,
},
discriminator='foo',
)

v = SchemaValidator(schema)
# The 'foo' field value 'apple' tells the validator to use apple_schema
assert v.validate_python({'foo': 'apple', 'bar': '123'}) == {'foo': 'apple', 'bar': 123}

Complex Discriminator Paths

In tests/validators/test_tagged_union.py, the codebase demonstrates using multiple paths to locate a tag. This is useful when the tag might exist in different locations depending on the input structure:

# Matches 'apple' if 'food' key is 'apple' OR if 'menu[1]' is 'apple'
schema = core_schema.tagged_union_schema(
choices={'apple': apple_schema, 'banana': banana_schema},
discriminator=[['food'], ['menu', 1]],
)

Recursive Choice Lookups

A unique feature of TaggedUnionSchema in this project is how it handles string values in the choices dictionary. If the value associated with a tag is a string rather than a CoreSchema, the validator uses that string to perform a recursive lookup within the same choices map. This mechanism is primarily used to handle internal Rust ownership constraints while allowing flexible mapping of multiple tags to the same schema.

Error Handling

Unions provide specific error types to help debug validation failures:

  • union_tag_not_found: Raised by a TaggedUnionSchema when the discriminator cannot be found in the input data (e.g., the field is missing).
  • union_tag_invalid: Raised when the extracted tag does not match any of the keys in the choices dictionary.
  • Standard Union Errors: When a standard UnionSchema fails, it typically returns a collection of errors from the attempted choices, though this can be overridden using custom_error_type.

For example, failing to find a tag in a path-based discriminator results in an error like: Unable to extract tag using discriminator 'food' | 'menu'.1 (as seen in test_discriminator_path).