Applied Python Chronicles: A Gentle Intro to Pydantic | by Ilija Lazarevic | Jul, 2024


What about default values and argument extractions?

from pydantic import validate_call

@validate_call(validate_return=True)
def add(*args: int, a: int, b: int = 4) -> int:
return str(sum(args) + a + b)

# ----
add(4,3,4)
> ValidationError: 1 validation error for add
a
Missing required keyword only argument [type=missing_keyword_only_argument, input_value=ArgsKwargs((4, 3, 4)), input_type=ArgsKwargs]
For further information visit

# ----

add(4, 3, 4, a=3)
> 18

# ----

@validate_call
def add(*args: int, a: int, b: int = 4) -> int:
return str(sum(args) + a + b)

# ----

add(4, 3, 4, a=3)
> '18'

Takeaways from this example:

  • You can annotate the type of the variable number of arguments declaration (*args).
  • Default values are still an option, even if you are annotating variable data types.
  • validate_call accepts validate_return argument, which makes function return value validation as well. Data type coercion is also applied in this case. validate_return is set to False by default. If it is left as it is, the function may not return what is declared in type hinting.

What about if you want to validate the data type but also constrain the values that variable can take? Example:

from pydantic import validate_call, Field
from typing import Annotated

type_age = Annotated[int, Field(lt=120)]

@validate_call(validate_return=True)
def add(age_one: int, age_two: type_age) -> int:
return age_one + age_two

add(3, 300)
> ValidationError: 1 validation error for add
1
Input should be less than 120 [type=less_than, input_value=200, input_type=int]
For further information visit

This example shows:

  • You can use Annotated and pydantic.Field to not only validate data type but also add metadata that Pydantic uses to constrain variable values and formats.
  • ValidationError is yet again very verbose about what was wrong with our function call. This can be really helpful.

Here is one more example of how you can both validate and constrain variable values. We will simulate a payload (dictionary) that you want to process in your function after it has been validated:

from pydantic import HttpUrl, PastDate
from pydantic import Field
from pydantic import validate_call
from typing import Annotated

Name = Annotated[str, Field(min_length=2, max_length=15)]

@validate_call(validate_return=True)
def process_payload(url: HttpUrl, name: Name, birth_date: PastDate) -> str:
return f'{name=}, {birth_date=}'

# ----

payload = {
'url': 'httpss://example.com',
'name': 'J',
'birth_date': '2024-12-12'
}

process_payload(**payload)
> ValidationError: 3 validation errors for process_payload
url
URL scheme should be 'http' or 'https' [type=url_scheme, input_value='httpss://example.com', input_type=str]
For further information visit
name
String should have at least 2 characters [type=string_too_short, input_value='J', input_type=str]
For further information visit
birth_date
Date should be in the past [type=date_past, input_value='2024-12-12', input_type=str]
For further information visit

# ----

payload = {
'url': '',
'name': 'Joe-1234567891011121314',
'birth_date': '2020-12-12'
}

process_payload(**payload)
> ValidationError: 1 validation error for process_payload
name
String should have at most 15 characters [type=string_too_long, input_value='Joe-1234567891011121314', input_type=str]
For further information visit

This was the basics of how to validate function arguments and their return value.

Now, we will go to the second most important way Pydantic can be used to validate and process data: through defining models.

This part is more interesting for the purposes of data processing, as you will see.

So far, we have used validate_call to decorate functions and specified function arguments and their corresponding types and constraints.

Here, we define models by defining model classes, where we specify fields, their types, and constraints. This is very similar to what we did previously. By defining a model class that inherits from Pydantic BaseModel, we use a hidden mechanism that does the data validation, parsing, and serialization. What this gives us is the ability to create objects that conform to model specifications.

Here is an example:

from pydantic import Field
from pydantic import BaseModel

class Person(BaseModel):
name: str = Field(min_length=2, max_length=15)
age: int = Field(gt=0, lt=120)

# ----

john = Person(name='john', age=20)
> Person(name='john', age=20)

# ----

mike = Person(name='m', age=0)
> ValidationError: 2 validation errors for Person
name
String should have at least 2 characters [type=string_too_short, input_value='j', input_type=str]
For further information visit
age
Input should be greater than 0 [type=greater_than, input_value=0, input_type=int]
For further information visit

You can use annotation here as well, and you can also specify default values for fields. Let’s see another example:

from pydantic import Field
from pydantic import BaseModel
from typing import Annotated

Name = Annotated[str, Field(min_length=2, max_length=15)]
Age = Annotated[int, Field(default=1, ge=0, le=120)]

class Person(BaseModel):
name: Name
age: Age

# ----

mike = Person(name='mike')
> Person(name='mike', age=1)

Things get very interesting when your use case gets a bit complex. Remember the payload that we defined? I will define another, more complex structure that we will go through and validate. To make it more interesting, let’s create a payload that we will use to query a service that acts as an intermediary between us and LLM providers. Then we will validate it.

Here is an example:

from pydantic import Field
from pydantic import BaseModel
from pydantic import ConfigDict

from typing import Literal
from typing import Annotated
from enum import Enum

payload = {
"req_id": "test",
"text": "This is a sample text.",
"instruction": "embed",
"llm_provider": "openai",
"llm_params": {
"llm_temperature": 0,
"llm_model_name": "gpt4o"
},
"misc": "what"
}

ReqID = Annotated[str, Field(min_length=2, max_length=15)]

class LLMProviders(str, Enum):
OPENAI = 'openai'
CLAUDE = 'claude'

class LLMParams(BaseModel):
temperature: int = Field(validation_alias='llm_temperature', ge=0, le=1)
llm_name: str = Field(validation_alias='llm_model_name',
serialization_alias='model')

class Payload(BaseModel):
req_id: str = Field(exclude=True)
text: str = Field(min_length=5)
instruction: Literal['embed', 'chat']
llm_provider: LLMProviders
llm_params: LLMParams

# model_config = ConfigDict(use_enum_values=True)

# ----

validated_payload = Payload(**payload)
validated_payload
> Payload(req_id='test',
text='This is a sample text.',
instruction='embed',
llm_provider=,
llm_params=LLMParams(temperature=0, llm_name='gpt4o'))

# ----

validated_payload.model_dump()
> {'text': 'This is a sample text.',
'instruction': 'embed',
'llm_provider': ,
'llm_params': {'temperature': 0, 'llm_name': 'gpt4o'}}

# ----

validated_payload.model_dump(by_alias=True)
> {'text': 'This is a sample text.',
'instruction': 'embed',
'llm_provider': ,
'llm_params': {'temperature': 0, 'model': 'gpt4o'}}

# ----

# After adding
# model_config = ConfigDict(use_enum_values=True)
# in Payload model definition, you get

validated_payload.model_dump(by_alias=True)
> {'text': 'This is a sample text.',
'instruction': 'embed',
'llm_provider': 'openai',
'llm_params': {'temperature': 0, 'model': 'gpt4o'}}

Some of the important insights from this elaborated example are:

  • You can use Enums or Literal to define a list of specific values that are expected.
  • In case you want to name a model’s field differently from the field name in the validated data, you can use validation_alias. It specifies the field name in the data being validated.
  • serialization_alias is used when the model’s internal field name is not necessarily the same name you want to use when you serialize the model.
  • Field can be excluded from serialization with exclude=True.
  • Model fields can be Pydantic models as well. The process of validation in that case is done recursively. This part is really awesome, since Pydantic does the job of going into depth while validating nested structures.
  • Fields that are not taken into account in the model definition are not parsed.

Here I will show you the snippets of code that show where and how you can use Pydantic in your day-to-day tasks.

Say you have data you need to validate and process. It can be stored in CSV, Parquet files, or, for example, in a NoSQL database in the form of a document. Let’s take the example of a CSV file, and let’s say you want to process its content.

Here is the CSV file (test.csv) example:

name,age,bank_account
johnny,0,20
matt,10,0
abraham,100,100000
mary,15,15
linda,130,100000

And here is how it is validated and parsed:

from pydantic import BaseModel
from pydantic import Field
from pydantic import field_validator
from pydantic import ValidationInfo
from typing import List
import csv

FILE_NAME = 'test.csv'

class DataModel(BaseModel):
name: str = Field(min_length=2, max_length=15)
age: int = Field(ge=1, le=120)
bank_account: float = Field(ge=0, default=0)

@field_validator('name')
@classmethod
def validate_name(cls, v: str, info: ValidationInfo) -> str:
return str(v).capitalize()

class ValidatedModels(BaseModel):
validated: List[DataModel]

validated_rows = []

with open(FILE_NAME, 'r') as f:
reader = csv.DictReader(f, delimiter=',')
for row in reader:
try:
validated_rows.append(DataModel(**row))
except ValidationError as ve:
# print out error
# disregard the record
print(f'{ve=}')

validated_rows
> [DataModel(name='Matt', age=10, bank_account=0.0),
DataModel(name='Abraham', age=100, bank_account=100000.0),
DataModel(name='Mary', age=15, bank_account=15.0)]

validated = ValidatedModels(validated=validated_rows)
validated.model_dump()
> {'validated': [{'name': 'Matt', 'age': 10, 'bank_account': 0.0},
{'name': 'Abraham', 'age': 100, 'bank_account': 100000.0},
{'name': 'Mary', 'age': 15, 'bank_account': 15.0}]}

FastAPI is already integrated with Pydantic, so this one is going to be very brief. The way FastAPI handles requests is by passing them to a function that handles the route. By passing this request to a function, validation is performed automatically. Something similar to validate_call that we mentioned at the beginning of this article.

Example of app.py that is used to run FastAPI-based service:

from fastapi import FastAPI
from pydantic import BaseModel, HttpUrl

class Request(BaseModel):
request_id: str
url: HttpUrl

app = FastAPI()

@app.post("/search/by_url/")
async def create_item(req: Request):
return item

Pydantic is a really powerful library and has a lot of mechanisms for a multitude of different use cases and edge cases as well. Today, I explained the most basic parts of how you should use it, and I’ll provide references below for those who are not faint-hearted.

Go and explore. I’m sure it will serve you well on different fronts.



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*