Skip to main content

Network and URL Types

URL validation in this codebase is handled through specialized core schemas that transform input strings into structured Url or MultiHostUrl objects. These schemas provide fine-grained control over URL components, including scheme restrictions, length limits, and default values for missing parts.

Core URL Schemas

The codebase defines two primary schemas for network addresses in pydantic_core/core_schema.py:

  • UrlSchema: Validates standard, single-host URLs (e.g., https://example.com).
  • MultiHostUrlSchema: Validates URLs that may contain multiple hosts, often used in database connection strings (e.g., postgres://host1,host2/db).

These are typically instantiated using the helper functions url_schema() and multi_host_url_schema().

Single-Host URLs

When using url_schema(), the validator ensures the input is a valid URL and returns a pydantic_core.Url object. This object provides direct access to the parsed components.

from pydantic_core import SchemaValidator, core_schema

# Define a basic URL schema
schema = core_schema.url_schema(allowed_schemes=['https', 'http'])
v = SchemaValidator(schema)

# Validation returns a Url object
url = v.validate_python('https://user:pass@example.com:8080/path?query=1#frag')

print(url.scheme) # 'https'
print(url.host) # 'example.com'
print(url.port) # 8080
print(url.path) # '/path'
print(url.username) # 'user'
print(url.password) # 'pass'
print(url.query) # 'query=1'

As seen in pydantic-core/tests/validators/test_url.py, the Url object also supports helper methods like unicode_host() and query_params(), the latter returning a list of tuples (e.g., [('query', '1')]).

Multi-Host URLs

The MultiHostUrlSchema is designed for scenarios where a single URL represents multiple endpoints. This is common in high-availability configurations for services like Redis or PostgreSQL.

from pydantic_core import SchemaValidator, core_schema

schema = core_schema.multi_host_url_schema()
v = SchemaValidator(schema)

# Validating a multi-host string
url = v.validate_python('redis://localhost,0.0.0.0,127.0.0.1:6379')

# MultiHostUrl provides a .hosts() method
for host_info in url.hosts():
print(host_info)
# {'username': None, 'password': None, 'host': 'localhost', 'port': 6379}
# ...

The .hosts() method returns a list of dictionaries, each containing the host, port, username, and password for that specific segment of the URL.

Configuration and Constraints

Both UrlSchema and MultiHostUrlSchema support several constraints to enforce specific URL structures.

Scheme Restrictions

The allowed_schemes parameter limits which protocols are accepted. If provided, it must be a non-empty list of strings.

# Only allow secure connections
schema = core_schema.url_schema(allowed_schemes=['https', 'ftps'])

Host Requirements

By default, a host is not strictly required by the schema (though it may be required by the underlying parser depending on the scheme). You can enforce its presence using host_required.

schema = core_schema.url_schema(host_required=True)
# v.validate_python('mailto:user@example.com') would fail if it lacks a host component

Default Components

You can provide default values for missing URL parts using default_host, default_port, and default_path. These are applied during validation if the input URL is missing that specific component.

schema = core_schema.url_schema(
default_host='localhost',
default_port=5432,
default_path='/postgres'
)
v = SchemaValidator(schema)

url = v.validate_python('postgres://user:pass@/dbname')
# The resulting Url object will use the defaults for missing parts

Path Handling and Normalization

A notable behavior in this implementation is how empty paths are handled. By default, an empty path is normalized to /.

Preserving Empty Paths

If you need to distinguish between an empty path and the root path, use the preserve_empty_path option.

# Default behavior (preserve_empty_path=False)
v1 = SchemaValidator(core_schema.url_schema())
print(v1.validate_python('https://example.com').path) # '/'

# Preserving empty path
v2 = SchemaValidator(core_schema.url_schema(preserve_empty_path=True))
print(v2.validate_python('https://example.com').path) # None or ''

This can also be controlled globally via the url_preserve_empty_path configuration in CoreConfig.

Strict vs. Lax Validation

The strict parameter determines how aggressively the validator attempts to coerce inputs. In strict mode, the validator expects a string that is already a valid URL. In lax mode (the default), it may perform more lenient parsing or allow types that can be converted to strings.

schema = core_schema.url_schema(strict=True)

When strict is True, validation will fail if the input does not conform exactly to the expected URL format for the given schema configuration.