Generator Validation
Generator validation in pydantic-core is designed to handle lazy sequences and iterators without exhausting them prematurely. This approach ensures that memory efficiency is maintained when dealing with large or infinite streams of data, while still providing robust validation for each item yielded.
The Generator Schema
The GeneratorSchema is a TypedDict that defines how a generator or iterable should be validated and serialized. It is typically constructed using the generator_schema helper function.
from pydantic_core import core_schema
schema = core_schema.generator_schema(
items_schema=core_schema.int_schema(),
min_length=2,
max_length=5
)
The schema includes several key fields:
items_schema: ACoreSchemaused to validate every item yielded by the generator.min_length/max_length: Constraints on the number of items the generator must yield.serialization: Configuration for how the generator should be serialized (e.g., usingfilter_seq_schemafor include/exclude logic).
Lazy Validation Mechanics
Unlike list or set validation, which processes all items immediately, generator validation is lazy. When you call validate_python on a generator, pydantic-core does not iterate over the input. Instead, it returns a ValidatorIterator object.
The ValidatorIterator
The ValidatorIterator wraps the original iterable and performs validation on-the-fly as items are requested. This means that a ValidationError is not raised when validate_python is called, but rather when next() is called on the resulting iterator if an item fails validation.
from pydantic_core import SchemaValidator, core_schema, ValidationError
def my_generator():
yield 1
yield "not an int"
v = SchemaValidator(core_schema.generator_schema(items_schema=core_schema.int_schema()))
validated_gen = v.validate_python(my_generator())
# The first item is valid
assert next(validated_gen) == 1
# The second item fails validation only when accessed
try:
next(validated_gen)
except ValidationError as e:
print(e.errors())
# Output: [{'type': 'int_parsing', 'loc': (1,), ...}]
As seen in pydantic-core/tests/validators/test_generator.py, the ValidatorIterator maintains an index attribute that tracks the number of items yielded so far. This index is used in the loc (location) of any ValidationError that occurs during iteration.
Length Constraints
Length constraints (min_length and max_length) are also enforced lazily:
max_length: If the generator yields more items than allowed, atoo_longerror is raised by theValidatorIteratoras soon as the limit is exceeded.min_length: This constraint is checked when the underlying generator is exhausted. If the total number of items yielded is less thanmin_length, atoo_shorterror is raised after the final valid item.
This behavior is verified in test_generator_too_long within the test suite, where the error is raised exactly at the step where the third item is attempted to be read from a generator with max_length=2.
Serialization
Generators are serialized as arrays (JSON) or lists (Python) by default. Similar to validation, serialization is lazy when using to_python or to_json.
SerializationIterator
When serializing a generator to Python, pydantic-core returns a SerializationIterator. This iterator performs any necessary transformations (like converting objects to dicts) as you iterate over it.
from pydantic_core import SchemaSerializer, core_schema
def gen():
yield 1
yield 2
s = SchemaSerializer(core_schema.generator_schema(core_schema.int_schema()))
ser_gen = s.to_python(gen())
assert next(ser_gen) == 1
assert ser_gen.index == 1
JSON Serialization
When serializing to JSON via to_json(), the generator is fully exhausted and represented as a standard JSON array. If the generator raises an exception during this process (e.g., a ValueError inside the generator function), the serializer will propagate that error.
Design Tradeoffs
The implementation of GeneratorSchema reflects a specific set of design choices:
- Memory vs. Eagerness: By choosing lazy validation,
pydantic-coreprioritizes memory efficiency. You can validate a generator yielding millions of rows from a database without loading them all into memory. The tradeoff is that you cannot know if the entire sequence is valid without consuming it. - Validator State: The
ValidatorIteratoris stateful. Once an item is consumed or a validation error is raised for a specific index, you cannot "restart" the validation from that point using the same iterator. - Error Context: Because errors happen during iteration, the
inputreported in aValidationErrorfor a generator is often the generator object itself (or its repr), rather than the specific failing value, which is instead identified by itslocindex.