Global String and Regex Behavior
In this codebase, string validation behavior is managed through a dual-layer configuration system. Global defaults are defined in CoreConfig, while specific field behaviors are defined in StringSchema. This allows for consistent string handling across an entire schema while permitting granular overrides where necessary.
Global Configuration with CoreConfig
The CoreConfig class in pydantic_core.core_schema provides several attributes that establish default validation rules for all string fields within a SchemaValidator. These settings are particularly useful for enforcing organization-wide or application-wide standards, such as maximum input lengths or automatic whitespace trimming.
Key global string attributes include:
str_max_lengthandstr_min_length: Set global bounds on string length.str_strip_whitespace: WhenTrue, leading and trailing whitespace is removed from all strings.str_to_lowerandstr_to_upper: Automatically transform string casing.
As shown in pydantic-core/tests/test_config.py, these global settings are applied when the StringSchema (via cs.str_schema()) does not specify its own constraints:
from pydantic_core import CoreConfig, SchemaValidator, core_schema as cs
# Global constraint applied via CoreConfig
v = SchemaValidator(
cs.str_schema(),
config=CoreConfig(str_max_length=5)
)
assert v.isinstance_python('test') is True
assert v.isinstance_python('test long') is False
Field-Level Overrides
Individual fields defined via StringSchema can override any global setting. If a constraint is defined in both CoreConfig and StringSchema, the field-level definition takes precedence.
This hierarchy is demonstrated in the project's test suite:
# Field-level max_length (5) overrides global str_max_length (10)
v = SchemaValidator(
cs.str_schema(max_length=5),
config=CoreConfig(str_max_length=10)
)
assert v.isinstance_python('test') is True
assert v.isinstance_python('test long') is False
Regex Engine Selection
The codebase supports two different engines for validating string patterns: rust-regex and python-re. This can be configured globally in CoreConfig.regex_engine or locally in StringSchema.regex_engine.
Rust Regex Engine (rust-regex)
This is the default engine. It is implemented using the Rust regex crate, which guarantees linear-time searching and protects against ReDoS (Regular Expression Denial of Service) attacks. However, it does not support certain advanced features like look-around or backreferences.
Python Regex Engine (python-re)
The Python engine uses the standard library re module. It should be selected when complex regex features are required that the Rust engine does not support.
An example from pydantic-core/tests/validators/test_string.py shows the Python engine being used for backreferences (which rust-regex would reject):
from pydantic_core import SchemaValidator, core_schema
# Using Python regex engine for backreference support (\1)
pattern = r'r(#*)".*?"\1'
v = SchemaValidator(
core_schema.str_schema(pattern=pattern, regex_engine='python-re')
)
assert v.validate_python('r#""#') == 'r#""#'
Coercion and Strict Mode
The coerce_numbers_to_str setting determines whether numeric types (like int or float) should be automatically converted to strings during validation.
- Non-Strict Mode: If
coerce_numbers_to_strisTrue, an input like123will be validated as"123". - Strict Mode: If
strict=Trueis set in eitherCoreConfigorStringSchema, coercion is disabled regardless of thecoerce_numbers_to_strsetting.
Order of Operations
When multiple transformations and validations are applied to a string, the internal pipeline follows a specific sequence:
- Coercion: Numbers are converted to strings (if enabled).
- Regex Matching: The
patternis checked against the raw (or coerced) string. - Transformations:
strip_whitespace,to_lower, andto_upperare applied. - Length Validation:
min_lengthandmax_lengthare checked against the final transformed string.
This order ensures that length constraints are enforced on the "clean" version of the data that will actually be stored or used by the application.