Skip to main content

Your First Collection: Lists and Generators

In this tutorial, you will learn how to define schemas for sequences and iterables using ListSchema and GeneratorSchema. By the end of this guide, you will be able to validate Python lists with specific item types and length constraints, and handle large data streams lazily using generators.

Prerequisites

To follow this tutorial, you need pydantic-core installed. You should be familiar with basic SchemaValidator usage and core schemas like int_schema.

Step 1: Defining a Basic List

The most common way to validate a collection is using list_schema. This schema ensures the input is a list and that every item within that list conforms to a sub-schema.

from pydantic_core import SchemaValidator, core_schema

# Define a schema for a list of integers
schema = core_schema.list_schema(core_schema.int_schema())
v = SchemaValidator(schema)

# Validating a list of strings that can be coerced to integers
result = v.validate_python(['1', 2, '3'])
print(result)
# Output: [1, 2, 3]

In this example, list_schema takes an items_schema (in this case, int_schema). When you call validate_python, the validator iterates through the input, applies the integer schema to each element, and returns a new list containing the validated results.

Step 2: Adding Length Constraints

You can restrict the size of the list by using the min_length and max_length arguments.

from pydantic_core import SchemaValidator, core_schema, ValidationError

schema = core_schema.list_schema(
core_schema.int_schema(),
min_length=2,
max_length=4
)
v = SchemaValidator(schema)

# This will pass
print(v.validate_python([1, 2, 3]))

# This will raise a ValidationError because it's too short
try:
v.validate_python([1])
except ValidationError as e:
print(e)

These constraints are checked before or during the iteration process. If the input list does not meet these requirements, a ValidationError is raised immediately.

Step 3: Lax vs. Strict Validation

By default, list_schema operates in "lax" mode. This means it will accept other iterable types (like tuples or sets) and automatically convert them into a list. If you want to enforce that the input must be a literal Python list, you can set strict=True.

from pydantic_core import SchemaValidator, core_schema, ValidationError

# Strict mode: only actual lists are allowed
schema = core_schema.list_schema(core_schema.int_schema(), strict=True)
v = SchemaValidator(schema)

# This works
v.validate_python([1, 2])

# This raises a ValidationError even though it's an iterable of ints
try:
v.validate_python((1, 2))
except ValidationError as e:
print(e)

In lax mode (the default), the validator is flexible and will attempt to create a list from any compatible iterable. In strict mode, it rejects anything that isn't a list instance.

Step 4: Lazy Validation with Generators

When dealing with very large datasets or streams, you may not want to load everything into a list at once. generator_schema allows you to validate items lazily as they are consumed.

from typing import Iterator
from pydantic_core import SchemaValidator, core_schema, ValidationError

def my_generator() -> Iterator[any]:
yield 1
yield 2
yield "not an integer"

schema = core_schema.generator_schema(items_schema=core_schema.int_schema())
v = SchemaValidator(schema)

# Validation returns a "validating iterator" immediately
validating_it = v.validate_python(my_generator())

print(next(validating_it)) # 1
print(next(validating_it)) # 2

# The error is only raised when we try to access the invalid third item
try:
print(next(validating_it))
except ValidationError as e:
print(e)

Unlike list_schema, which validates the entire collection and returns a list, generator_schema returns a special iterator. Validation for each item happens only when next() is called on that iterator. This preserves the memory efficiency of Python generators.

Complete Result

You have now built schemas for both eager and lazy collections. Here is a summary of the differences:

  • list_schema: Eagerly validates all items and returns a list. Supports strict mode to enforce the list type.
  • generator_schema: Lazily validates items and returns an iterator. Errors are raised only when the offending item is reached during iteration.

Both schemas support min_length and max_length to control the number of items allowed in the sequence. For generators, length constraints are checked as the iterator is consumed; a max_length error will trigger as soon as too many items are yielded, while a min_length error will trigger when the generator closes prematurely.