Videos
In my code I have an alias for a type and mypy is happy about it.
NumberedPaths = list[tuple[str, str]]
Should I use TypeVar instead? Is there any better way to define a new type?
The two concepts aren't related any more than any other type-related concepts.
In short, a TypeVar is a variable you can use in type signatures so you can refer to the same unspecified type more than once, while a NewType is used to tell the type checker that some values should be treated as their own type.
Type Variables
To simplify, type variables let you refer to the same type more than once without specifying exactly which type it is.
In a definition, a single type variable always takes the same value.
# (This code will type check, but it won't run.)
from typing import TypeVar, Generic
# Two type variables, named T and R
T = TypeVar('T')
R = TypeVar('R')
# Put in a list of Ts and get out one T
def get_one(x: list[T]) -> T: ...
# Put in a T and an R, get back an R and a T
def swap(x: T, y: R) -> tuple[R, T]:
return y, x
# A simple generic class that holds a value of type T
class ValueHolder(Generic[T]):
def __init__(self, value: T):
self.value = value
def get(self) -> T:
return self.value
x: ValueHolder[int] = ValueHolder(123)
y: ValueHolder[str] = ValueHolder('abc')
Without type variables, there wouldn't be a good way to declare the type of get_one or ValueHolder.get.
There are a few other options on TypeVar. You can restrict the possible values by passing in more types (e.g. TypeVar(name, int, str)), or you can give an upper bound so every value of the type variable must be a subtype of that type (e.g. TypeVar(name, bound=int)).
Additionally, you can decide whether a type variable is covariant, contravariant, or neither when you declare it. This essentially decides when subclasses or superclasses can be used in place of a generic type. PEP 484 describes these concepts in more detail, and refers to additional resources.
Addendum: Python 3.12 generic parameter lists
Starting in Python 3.12, the following syntax has been available to declare type variables.
def get_oneT -> T: ...
def swapT, R -> tuple[R, T]: ...
class ValueHolder[T]:
def __init__(self, value: T): ...
def get(self) -> T: ...
These declarations are equivalent to those above, but now the type variables are only defined in type signatures within their functions/classes, rather than being stored in regular Python variables. The Python 3.12 release notes contain a summary, as well as links to more-detailed documentation.
NewType
A NewType is for when you want to declare a distinct type without actually doing the work of creating a new type or worry about the overhead of creating new class instances.
In the type checker, NewType('Name', int) creates a subclass of int named "Name."
At runtime, NewType('Name', int) is not a class at all; it is actually the identity function, so x is NewType('Name', int)(x) is always true.
from typing import NewType
UserId = NewType('UserId', int)
def get_user(x: UserId): ...
get_user(UserId(123456)) # this is fine
get_user(123456) # that's an int, not a UserId
UserId(123456) + 123456 # fine, because UserId is a subclass of int
To the type checker, UserId looks something like this:
class UserId(int): pass
But at runtime, UserId is basically just this:
def UserId(x): return x
There's almost nothing more than that to a NewType at runtime. In Python 3.8.1, its implementation was almost exactly as follows:
def NewType(name, type_):
def identity(x):
return x
identity.__name__ = name
return identity
NewType() accepts an unique type parameter. To specialize the function for different types for static typing, you only need a TypeVar here.
Example: Read https://dev.to/decorator_factory/typevars-explained-hmo
The purpose of the TypeVar in this context is to say that the function returns a specific type that is related to the argument's type.
For example, if you did:
a = first([1, 2, 3]) + "foo"
you would get an error, because in this expression T becomes bound to the type int, and so you'd get an error about adding an int and a str.
If you annotated first with Any types as you describe, this would not produce a mypy error (and hence you'd get a TypeError at runtime instead), because the return value of first would always simply be Any.
See the mypy documentation on generics for a lot more examples of how to use typevars: https://mypy.readthedocs.io/en/stable/generics.html
The best example is with pydantic.
Imagine I have a function that implements pydantic, and I want that function to be able to handle my pydantic type for records retrieved as a dict from firestore. That code may look something like this:
class MyModel(BaseModel):
...
class MyClass:
def get_records(...) -> Generator[MyModel, None, None]:
for record in self.client.collection("MyModel").where(...).stream(...):
body = record.to_dict()
if body:
yield MyModel.model_validate(body)
Now that's great and all, but then what if I have multiple models, then I have to define a function for each one, right? Pretty annoying.
Ok what if I use a Union.
class MyModel(BaseModel):
...
class MyModel2(BaseModel):
...
T: TypeAlias = Union[MyModel, MyModel2]
class MyClass:
def get_records(
my_union_type: Type[T],
collection_name: str
) -> Generator[T, None, None]:
for record in self.client.collection(collection_name).where(...).stream(...):
body = record.to_dict()
if body:
if isinstance(my_union_type, MyModel):
yield MyModel.model_validate(body)
elif isinstance(my_union_type, MyModel2):
yield MyModel2.model_validate(body)
As you can see we can now use our new type, but its a bit messy, no? We are calling the same method model_validate but the only problem is, we can't refer to the type in a dynamic way and just say "a unknown basemodel" like this.
And, what's more, for each type we want it to handle we have to repeat this same logic...
Step in TypeVar... TypeVar allows us to specify a variable for a Type, as a opposed to defining a type as a type.
If that doesn't make sense, think of it this way:
# this is a type stored in a variable
my_model_as_a_type = MyModel
# this is an instance stored in a variable
mymodel_as_an_instance = MyModel()
As you can see, a type is literally the thing that defines what that object looks like, but it cannot be used as that object, because it is not an instance of it.
An instance is that thing, initialised in memory, with all the functions and whatever else that implementation of its type has defined.
So, moving onto typevar... how can we improve our code then?
Well, with a few simple changes we can make our function accept any basemodel, but not explicitly say it has to be BaseModel itself, e.g. can be a child or whatever and that the caller of the function can see the type returned to them...
class MyModel(BaseModel):
...
class MyModel2(BaseModel):
...
T = TypeVar("T", bound=BaseModel)
class MyClass:
def get_records(
model_type: Type[T],
collection_path: str
) -> Generator[T, None, None]:
adapter = TypeAdapter(model_type)
for record in self.client.collection(collection_path).where(...).stream(...):
body = record.to_dict()
if body:
yield adapter.validate_python(body)
Now when you pass in a model to this function, the object you get back will be an instance from the firestore collection of the type you gave it. And typecheckers are happy, and your colleagues are happy because now they know what the type being returned is, instead of it "possibly" being a union and the output being a "variable" of whatever model happens to be passed in and leaving it to the caller to then figure out which of the union types it is, you can have functions that are defined to only operate on specific collections for specific types...
So we could then have get_my_model_records and get_my_model_2_records function that yield the results of get_records as another generator, and merely provide their specific types to it and whatever else specific implementation details you need, meaning the consumers of your API for these two methods, know exactly what objects they will get back when they call these functions and can easily see that each method is specific for each type.