Models and Dataclasses
In this tutorial, you will learn how to define structured data using the low-level model and dataclass schemas provided by pydantic-core. You will build a user management system that uses a Pydantic-style model for persistent data and a Python dataclass for temporary initialization logic.
Prerequisites
To follow this tutorial, you need pydantic-core installed in your environment. You should also be familiar with basic Python classes and the dataclasses module.
Step 1: Defining a Model Schema
First, you will create a standard Python class and define a ModelSchema for it. This schema tells pydantic-core how to validate input data and map it to your class attributes.
from pydantic_core import SchemaValidator, core_schema
class User:
def __init__(self, **kwargs):
for key, value in kwargs.items():
setattr(self, key, value)
# Define the model structure
user_schema = core_schema.model_schema(
User,
core_schema.model_fields_schema(
{
'username': core_schema.model_field(core_schema.str_schema()),
'age': core_schema.model_field(core_schema.int_schema()),
}
)
)
validator = SchemaValidator(user_schema)
user = validator.validate_python({'username': 'jdoe', 'age': 30})
print(f"User: {user.username}, Age: {user.age}")
# Output: User: jdoe, Age: 30
In this step, model_schema (which produces a ModelSchema dict) acts as the top-level container for the User class. Inside it, model_fields_schema (a ModelFieldsSchema) maps field names to model_field definitions. Each ModelField specifies the validation logic (e.g., str_schema or int_schema) for that specific attribute.
Step 2: Adding Computed Fields for Serialization
Often, you need to include data in your output that isn't stored directly in the model. You can achieve this using ComputedField.
from pydantic_core import SchemaSerializer, core_schema
class UserWithDisplay:
def __init__(self, username, age):
self.username = username
self.age = age
@property
def display_name(self):
return f"{self.username} ({self.age})"
schema = core_schema.model_schema(
UserWithDisplay,
core_schema.model_fields_schema(
{
'username': core_schema.model_field(core_schema.str_schema()),
'age': core_schema.model_field(core_schema.int_schema()),
},
computed_fields=[
core_schema.computed_field('display_name', core_schema.str_schema())
]
)
)
serializer = SchemaSerializer(schema)
user_inst = UserWithDisplay(username='jdoe', age=30)
print(serializer.to_python(user_inst))
# Output: {'username': 'jdoe', 'age': 30, 'display_name': 'jdoe (30)'}
The ComputedField is added to the computed_fields list within the ModelFieldsSchema. It references a property on the class (display_name) and defines how it should be serialized. Note that computed fields are used only during serialization and do not affect validation.
Step 3: Defining a Dataclass Schema
Dataclasses in pydantic-core use a slightly different structure because they focus on constructor arguments. You will use DataclassSchema and DataclassArgsSchema.
import dataclasses
from pydantic_core import SchemaValidator, core_schema
@dataclasses.dataclass
class Profile:
email: str
active: bool
profile_schema = core_schema.dataclass_schema(
Profile,
core_schema.dataclass_args_schema(
'Profile',
[
core_schema.dataclass_field(name='email', schema=core_schema.str_schema()),
core_schema.dataclass_field(name='active', schema=core_schema.bool_schema()),
]
),
fields=['email', 'active']
)
v = SchemaValidator(profile_schema)
profile = v.validate_python({'email': 'test@example.com', 'active': 'true'})
print(profile)
# Output: Profile(email='test@example.com', active=True)
Unlike models, DataclassSchema requires a DataclassArgsSchema which defines the arguments passed to the dataclass __init__. Each DataclassField describes a parameter, including its name and validation schema.
Step 4: Handling Init-Only Fields
Python dataclasses support InitVar fields that are passed to __init__ and __post_init__ but not stored on the instance. You can model this using the init_only flag.
@dataclasses.dataclass
class SecretProfile:
username: str
# 'token' is an InitVar in standard dataclasses
def __post_init__(self, token: str):
self.is_admin = (token == "secret-token")
secret_schema = core_schema.dataclass_schema(
SecretProfile,
core_schema.dataclass_args_schema(
'SecretProfile',
[
core_schema.dataclass_field(name='username', schema=core_schema.str_schema()),
core_schema.dataclass_field(
name='token',
schema=core_schema.str_schema(),
init_only=True
),
],
collect_init_only=True
),
fields=['username'],
post_init=True
)
v = SchemaValidator(secret_schema)
# The 'token' is validated and passed to __post_init__ but not kept in the final object dict
profile = v.validate_python({'username': 'admin', 'token': 'secret-token'})
print(f"Admin: {profile.is_admin}")
# Output: Admin: True
By setting init_only=True on a DataclassField and collect_init_only=True on the DataclassArgsSchema, pydantic-core will gather these fields during validation and pass them to the class constructor (and subsequently __post_init__ if post_init=True is set in the DataclassSchema).
Complete Result
You have now built a system that can:
- Validate and instantiate custom Python classes using
ModelSchema. - Include dynamic properties in output using
ComputedField. - Handle standard Python dataclasses using
DataclassSchema. - Manage complex initialization logic with
init_onlyfields andpost_inithooks.
This hierarchy of schemas—from the top-level ModelSchema/DataclassSchema down to individual ModelField/DataclassField definitions—provides the granular control needed for high-performance data validation and serialization in pydantic-core.