Skip to main content

Models and Dataclasses

In this tutorial, you will learn how to define structured data using the low-level model and dataclass schemas provided by pydantic-core. You will build a user management system that uses a Pydantic-style model for persistent data and a Python dataclass for temporary initialization logic.

Prerequisites

To follow this tutorial, you need pydantic-core installed in your environment. You should also be familiar with basic Python classes and the dataclasses module.

Step 1: Defining a Model Schema

First, you will create a standard Python class and define a ModelSchema for it. This schema tells pydantic-core how to validate input data and map it to your class attributes.

from pydantic_core import SchemaValidator, core_schema

class User:
def __init__(self, **kwargs):
for key, value in kwargs.items():
setattr(self, key, value)

# Define the model structure
user_schema = core_schema.model_schema(
User,
core_schema.model_fields_schema(
{
'username': core_schema.model_field(core_schema.str_schema()),
'age': core_schema.model_field(core_schema.int_schema()),
}
)
)

validator = SchemaValidator(user_schema)
user = validator.validate_python({'username': 'jdoe', 'age': 30})

print(f"User: {user.username}, Age: {user.age}")
# Output: User: jdoe, Age: 30

In this step, model_schema (which produces a ModelSchema dict) acts as the top-level container for the User class. Inside it, model_fields_schema (a ModelFieldsSchema) maps field names to model_field definitions. Each ModelField specifies the validation logic (e.g., str_schema or int_schema) for that specific attribute.

Step 2: Adding Computed Fields for Serialization

Often, you need to include data in your output that isn't stored directly in the model. You can achieve this using ComputedField.

from pydantic_core import SchemaSerializer, core_schema

class UserWithDisplay:
def __init__(self, username, age):
self.username = username
self.age = age

@property
def display_name(self):
return f"{self.username} ({self.age})"

schema = core_schema.model_schema(
UserWithDisplay,
core_schema.model_fields_schema(
{
'username': core_schema.model_field(core_schema.str_schema()),
'age': core_schema.model_field(core_schema.int_schema()),
},
computed_fields=[
core_schema.computed_field('display_name', core_schema.str_schema())
]
)
)

serializer = SchemaSerializer(schema)
user_inst = UserWithDisplay(username='jdoe', age=30)
print(serializer.to_python(user_inst))
# Output: {'username': 'jdoe', 'age': 30, 'display_name': 'jdoe (30)'}

The ComputedField is added to the computed_fields list within the ModelFieldsSchema. It references a property on the class (display_name) and defines how it should be serialized. Note that computed fields are used only during serialization and do not affect validation.

Step 3: Defining a Dataclass Schema

Dataclasses in pydantic-core use a slightly different structure because they focus on constructor arguments. You will use DataclassSchema and DataclassArgsSchema.

import dataclasses
from pydantic_core import SchemaValidator, core_schema

@dataclasses.dataclass
class Profile:
email: str
active: bool

profile_schema = core_schema.dataclass_schema(
Profile,
core_schema.dataclass_args_schema(
'Profile',
[
core_schema.dataclass_field(name='email', schema=core_schema.str_schema()),
core_schema.dataclass_field(name='active', schema=core_schema.bool_schema()),
]
),
fields=['email', 'active']
)

v = SchemaValidator(profile_schema)
profile = v.validate_python({'email': 'test@example.com', 'active': 'true'})

print(profile)
# Output: Profile(email='test@example.com', active=True)

Unlike models, DataclassSchema requires a DataclassArgsSchema which defines the arguments passed to the dataclass __init__. Each DataclassField describes a parameter, including its name and validation schema.

Step 4: Handling Init-Only Fields

Python dataclasses support InitVar fields that are passed to __init__ and __post_init__ but not stored on the instance. You can model this using the init_only flag.

@dataclasses.dataclass
class SecretProfile:
username: str
# 'token' is an InitVar in standard dataclasses
def __post_init__(self, token: str):
self.is_admin = (token == "secret-token")

secret_schema = core_schema.dataclass_schema(
SecretProfile,
core_schema.dataclass_args_schema(
'SecretProfile',
[
core_schema.dataclass_field(name='username', schema=core_schema.str_schema()),
core_schema.dataclass_field(
name='token',
schema=core_schema.str_schema(),
init_only=True
),
],
collect_init_only=True
),
fields=['username'],
post_init=True
)

v = SchemaValidator(secret_schema)
# The 'token' is validated and passed to __post_init__ but not kept in the final object dict
profile = v.validate_python({'username': 'admin', 'token': 'secret-token'})
print(f"Admin: {profile.is_admin}")
# Output: Admin: True

By setting init_only=True on a DataclassField and collect_init_only=True on the DataclassArgsSchema, pydantic-core will gather these fields during validation and pass them to the class constructor (and subsequently __post_init__ if post_init=True is set in the DataclassSchema).

Complete Result

You have now built a system that can:

  1. Validate and instantiate custom Python classes using ModelSchema.
  2. Include dynamic properties in output using ComputedField.
  3. Handle standard Python dataclasses using DataclassSchema.
  4. Manage complex initialization logic with init_only fields and post_init hooks.

This hierarchy of schemas—from the top-level ModelSchema/DataclassSchema down to individual ModelField/DataclassField definitions—provides the granular control needed for high-performance data validation and serialization in pydantic-core.