Schema Configuration and Behavior
The CoreConfig class in pydantic_core.core_schema serves as the central configuration hub for both validation and serialization. It allows developers to define global behaviors that apply across an entire schema tree, ensuring consistency in how data is processed, constrained, and formatted.
When a SchemaValidator or SchemaSerializer is instantiated, it accepts a CoreConfig object. This configuration acts as a set of defaults that can be overridden by specific field-level settings within the schema itself.
Validation Control and Strictness
One of the primary roles of CoreConfig is to define the "strictness" of the validation process. By setting the strict attribute, developers can control whether pydantic-core should attempt to coerce data (lax mode) or require exact type matches.
Strictness and Coercion
In lax mode (the default), pydantic-core performs various coercions, such as converting a string "123" to an integer. Setting strict=True disables these behaviors globally.
from pydantic_core import CoreConfig, SchemaValidator, core_schema as cs
# Lax mode (default)
v_lax = SchemaValidator(cs.int_schema())
assert v_lax.validate_python("123") == 123
# Strict mode via CoreConfig
v_strict = SchemaValidator(cs.int_schema(), config=CoreConfig(strict=True))
# This would raise a ValidationError
# v_strict.validate_python("123")
Alias Handling
CoreConfig provides granular control over how field names and aliases are resolved during validation. This is particularly important when dealing with external APIs that use different naming conventions (e.g., camelCase vs snake_case).
validate_by_alias: (Default:True) Determines if the validator should look for data using the field's alias.validate_by_name: (Default:False) Determines if the validator should also look for data using the field's original Python name.
Extra Fields Behavior
The extra_fields_behavior option (of type ExtraBehavior) defines how structured objects like TypedDict or Model handle fields not explicitly defined in the schema:
'ignore': (Default) Extra fields are silently dropped.'forbid': Validation fails if extra fields are present.'allow': Extra fields are preserved in the output.
Type-Specific Constraints
CoreConfig allows for the definition of global constraints on basic types, which is useful for enforcing organization-wide data policies (e.g., maximum string lengths).
String and Numeric Constraints
Global string constraints like str_max_length or str_strip_whitespace apply to all string schemas within the validator unless a specific str_schema provides its own value.
from pydantic_core import CoreConfig, SchemaValidator, core_schema as cs
# Enforce a global maximum string length
config = CoreConfig(str_max_length=5, str_strip_whitespace=True)
v = SchemaValidator(cs.str_schema(), config=config)
assert v.validate_python(" test ") == "test"
# v.validate_python("too long") # Raises ValidationError
For numeric types, allow_inf_nan (default True) controls whether float('inf') and float('nan') are considered valid.
Serialization and JSON Interop
The configuration options in CoreConfig also dictate how data is transformed back into strings or bytes, particularly for JSON serialization.
Temporal and Bytes Formatting
Handling dates, times, and binary data in JSON requires standardized formats. CoreConfig provides several options:
ser_json_temporal: Controls the format fordatetime,date,time, andtimedelta. Options include'iso8601','seconds', and'milliseconds'.ser_json_bytes: Controls howbytesare serialized to JSON, supporting'utf8','base64', and'hex'.val_json_bytes: Complements serialization by defining how to validate bytes from JSON strings.
from pydantic_core import CoreConfig, SchemaValidator, core_schema as cs
# Configure bytes validation to expect base64 in JSON
config = CoreConfig(val_json_bytes='base64')
v = SchemaValidator(cs.bytes_schema(), config=config)
# Validates a base64 encoded string from JSON input
encoded = b'"2AfBVHgkkUYl8_NJythADO7Dq_9_083N-cIQ5KGwMWU="'
assert v.validate_json(encoded) == b'\xd8\x07\xc1Tx$\x91F%\xf3\xf3I\xca\xd8@\x0c\xee\xc3\xab\xff\x7f\xd3\xcd\xcd\xf9\xc2\x10\xe4\xa1\xb01e'
Special Float Values in JSON
Standard JSON does not support Infinity or NaN. CoreConfig allows developers to choose how these are handled via ser_json_inf_nan:
'null': (Default) Serializes tonull.'constants': Serializes toInfinity,-Infinity, orNaN(non-standard JSON).'strings': Serializes to"Infinity","-Infinity", or"NaN".
Error Handling and Metadata
CoreConfig influences the structure and content of ValidationError messages.
loc_by_alias: (DefaultTrue) IfTrue, error locations (loc) will use the field's alias. IfFalse, they will use the internal field name.hide_input_in_errors: For security and privacy, this can be set toTrueto prevent the original input data from being included in theValidationErrorrepresentation.validation_error_cause: Controls whether the underlying Python exception that caused a validation failure is attached to theValidationErrorvia the__cause__attribute.
Design Decisions and Tradeoffs
Configuration Hierarchy
A key design principle in pydantic-core is that field-level settings take precedence over global configuration. For example, if CoreConfig sets str_max_length=10, but a specific str_schema(max_length=20) is used, the validator will respect the 20 for that specific field. This allows for sensible defaults while maintaining flexibility for edge cases.
Performance and Caching
The cache_strings option (default True) is a performance optimization. When enabled, pydantic-core caches strings during validation. This is particularly beneficial when the same keys appear frequently in large datasets (e.g., a list of dictionaries with the same keys). Developers can fine-tune this with 'all', 'keys', or 'none' depending on the memory vs. speed requirements of their application.
Serialization Precedence
In the case of temporal types, ser_json_temporal is a broader setting that covers datetime, date, time, and timedelta. The implementation explicitly gives ser_json_temporal precedence over the more specific ser_json_timedelta if both are provided in the configuration. This ensures a consistent format across all temporal types when a general preference is expressed.