Instance Revalidation Strategies
In pydantic-core, the revalidate_instances setting determines how the validator handles input that is already an instance of the target model or dataclass. This configuration is critical for balancing performance with data integrity, especially in systems where objects may be mutated after their initial creation.
The strategy can be configured globally via CoreConfig or specifically for individual models and dataclasses using the revalidate_instances field in ModelSchema and DataclassSchema.
Revalidation Strategies
There are three primary strategies for instance revalidation, defined by the Literal['always', 'never', 'subclass-instances'] type.
The Default Strategy: never
By default, pydantic-core uses the 'never' strategy. In this mode, if the input to validate_python is already an instance of the expected class (or a subclass), the validator trusts the object and returns it immediately without performing any internal validation.
- Performance: This is the most performant option as it avoids the overhead of checking fields and creating a new object.
- Identity: The original object identity is preserved (
output is input). - Risk: If an instance was mutated after its initial validation (e.g.,
my_model.some_field = 'invalid_value'), the'never'strategy will not catch this inconsistency.
The Safety Strategy: always
The 'always' strategy forces the validator to re-examine the internal state of every instance passed to it, regardless of whether it is already an instance of the target class.
When 'always' is enabled, the validator extracts the data from the existing instance (typically using __dict__ and __pydantic_fields_set__) and runs it through the validation logic again.
# Example of 'always' revalidation in ModelSchema
v = SchemaValidator(
core_schema.model_schema(
cls=MyModel,
revalidate_instances='always',
schema=core_schema.model_fields_schema(
fields={
'field_a': core_schema.model_field(schema=core_schema.str_schema()),
'field_b': core_schema.model_field(schema=core_schema.int_schema()),
}
),
)
)
m2 = MyModel(field_a='x', field_b=42)
m3 = v.validate_python(m2)
assert m3 is not m2 # A NEW instance is created
assert m3.field_a == 'x'
As seen in pydantic-core/tests/validators/test_model.py, this strategy ensures that even if m2 was manually altered to an invalid state, v.validate_python(m2) would raise a ValidationError.
The Hybrid Strategy: subclass-instances
The 'subclass-instances' strategy provides a middle ground. It trusts instances that are exactly the class defined in the schema but re-validates any instances that are subclasses.
This is particularly useful for enforcing "type narrowing" or ensuring that a subclass doesn't carry extra state or behavior when a base class is expected.
# Example of 'subclass-instances' behavior
class MyModel: ...
class MySubModel(MyModel): ...
v = SchemaValidator(
core_schema.model_schema(
cls=MyModel,
revalidate_instances='subclass-instances',
schema=...
)
)
m1 = MyModel()
assert v.validate_python(m1) is m1 # Exact class is trusted
m3 = MySubModel()
m4 = v.validate_python(m3)
assert m4 is not m3
assert type(m4) is MyModel # Subclass is coerced to base class
In this mode, as demonstrated in the codebase's tests, the subclass instance is re-validated and then coerced back into the base class type.
Implementation Details
The revalidation logic is integrated into the core validation loop for models and dataclasses.
Configuration Hierarchy
The setting is resolved in the following order:
- The
revalidate_instancesfield on the specificModelSchemaorDataclassSchema. - The
revalidate_instancesfield in theCoreConfigpassed to the schema. - The default value of
'never'.
Technical Requirements
For revalidation to work (especially for models), the input instance must provide access to its internal state. The validator typically looks for:
__dict__: To extract the field values.__pydantic_fields_set__: To determine which fields were explicitly set.__pydantic_extra__: For models that allow extra fields.
If these attributes are missing or malformed, the revalidation process may fail or behave unexpectedly.
Trade-offs and Constraints
Object Identity
A critical side effect of 'always' and 'subclass-instances' (for subclasses) is the loss of object identity. Because the validator creates a new instance of the class after re-validation, input is output will be False. This can be a "gotcha" if your application logic relies on maintaining a reference to a specific object instance.
Performance Overhead
Revalidation is not free. It involves:
- Iterating over the object's attributes.
- Running each attribute through its respective validator.
- Instantiating a new Python object.
In high-throughput systems where objects are known to be immutable or are never mutated after creation, the default 'never' strategy is significantly more efficient.
Coercion in Subclasses
The 'subclass-instances' strategy effectively strips away any subclass-specific data or methods by returning a base class instance. This ensures that the resulting object strictly adheres to the schema of the base class, which is a form of safety when passing data across boundaries where only the base contract is guaranteed.