Instance Revalidation Strategies
In pydantic-core, validation often involves processing data that is already an instance of the target model or dataclass. The revalidate_instances setting determines whether these existing instances should be trusted as-is or subjected to the validation logic again. This choice represents a fundamental trade-off between execution speed and data integrity.
The Re-validation Dilemma
When a SchemaValidator encounters an input that matches the expected type (e.g., an instance of a class defined in a ModelSchema), it must decide whether to:
- Trust the instance: Assume the object is already valid because it was created through the model's own constructor or a previous validation pass.
- Verify the instance: Re-run the validation logic against the object's attributes to ensure they still conform to the schema.
Verification is necessary because Python objects are generally mutable. A model instance might have been modified manually after creation in a way that violates its schema constraints. Furthermore, in polymorphic scenarios, a subclass instance might be passed where a base class is expected, potentially introducing unexpected data or behavior.
Re-validation Strategies
The revalidate_instances setting, available in CoreConfig, ModelSchema, and DataclassSchema, supports three distinct strategies.
'never' (Default)
The 'never' strategy is the default behavior. It prioritizes performance by assuming that if an input is already an instance of the target class (or a subclass), it is valid.
# Default behavior (revalidate_instances='never')
v = SchemaValidator(core_schema.model_schema(cls=MyModel, schema=...))
m1 = MyModel(field_a='valid')
m2 = v.validate_python(m1)
assert m1 is m2 # Identity is preserved; no validation performed
This strategy is ideal for internal workflows where data is known to be "clean" and performance is critical. However, it will not catch manual mutations that break constraints.
'always'
The 'always' strategy forces re-validation regardless of whether the input is already an instance. This ensures absolute data integrity at the cost of performance. When this strategy is used, pydantic-core typically creates a new instance of the class, effectively "cleaning" the data.
This is particularly useful for catching manual attribute assignments that bypass validation:
# Using revalidate_instances='always'
v = SchemaValidator(
core_schema.model_schema(
cls=MyModel,
revalidate_instances='always',
schema=core_schema.model_fields_schema(
fields={
'field_a': core_schema.model_field(schema=core_schema.str_schema()),
}
),
)
)
m1 = MyModel(field_a='initial')
m1.field_a = 123 # Manual mutation to an invalid type (int instead of str)
# Re-validation catches the error or coerces the data
m2 = v.validate_python(m1)
assert m2 is not m1 # A new instance is created
assert isinstance(m2.field_a, str) # Data is coerced/validated again
'subclass-instances'
The 'subclass-instances' strategy provides a middle ground. It trusts instances that are the exact class specified in the schema but re-validates any instances that are subclasses.
This strategy is designed to handle polymorphism safely. It ensures that if a user provides a specialized subclass where a base class is expected, the data is validated against the base class's requirements and potentially converted back to the base class.
# Using revalidate_instances='subclass-instances'
schema = core_schema.dataclass_schema(
BaseDataclass,
core_schema.dataclass_args_schema(
'BaseDataclass',
[core_schema.dataclass_field(name='a', schema=core_schema.str_schema())],
),
['a'],
revalidate_instances='subclass-instances',
)
v = SchemaValidator(schema)
# Exact instance: Trusted
base = BaseDataclass(a='hello')
assert v.validate_python(base) is base
# Subclass instance: Re-validated
sub = SubDataclass(a='world')
result = v.validate_python(sub)
assert result is not sub
assert type(result) is BaseDataclass
Configuration and Scope
The re-validation behavior can be configured at two levels within pydantic-core/python/pydantic_core/core_schema.py:
- Global Configuration: Via the
revalidate_instancesfield inCoreConfig. This sets the default for all models and dataclasses within the schema tree. - Schema-Specific Override: Both
ModelSchemaandDataclassSchemainclude arevalidate_instancesfield that overrides the global configuration for that specific model or dataclass.
class ModelSchema(TypedDict, total=False):
# ...
revalidate_instances: Literal['always', 'never', 'subclass-instances'] # default: 'never'
config: CoreConfig
# ...
When re-validation occurs (under 'always' or for subclasses in 'subclass-instances'), the validator extracts the data from the instance (often using __dict__ or attribute access if from_attributes is enabled) and passes it through the standard validation pipeline. This process ensures that all field-level validators, constraints, and coercions are applied consistently.