InputFormat (renamed it to keep type notation consistent) can be a subtype or alias of Tuple[str, str, str]. Having it be a subtype (your first example) instead of an alias (your second example) is useful for a situation where you want to statically verify (through something like mypy) that all InputFormats were made in a certain way. For example:
def new_input_format(a: str) -> InputFormat:
return InputFormat((a, a * 2, a * 4))
def print_input_format(input_format: InputFormat):
print(input_format)
print_input_format(new_input_format("a")) # Statement 1
print_input_format(("a", "aa", "aaa")) # Statement 2
If InputFormat is declared as an alias (through InputFormat = Tuple[str, str, str]), both statements will statically verify. If InputFormat is declared as a subtype (through InputFormat = NewType('InputFormat', Tuple[str, str, str])), only the first statement will statically verify.
Now this isn't foolproof. A third statement such as:
print_input_format(InputFormat(("a", "aa", "aaa")))
will statically verify, yet it bypasses our careful InputFormat creator called new_input_format. However, by making InputFormat a subtype here we were forced to explicitly acknowledge that we're creating an input format through having to wrap the tuple in an InputFormat, which makes it easier to maintain this type of code and spot potential bugs in input format constructions.
Another example where NewType is beneficial over a type alias:
Let's say you had a database which we expose two functions for:
def read_user_id_from_session_id(session_id: str) -> Optional[str]:
...
def read_user(user_id: str) -> User:
...
intended to be called like this (exhibit A):
user_id = read_user_id_by_session_id(session_id)
if user_id:
user = read_user(user_id)
# Do something with `user`.
else:
print("User not found!")
Forget about the fact that we can use a join here to make this only one query instead of two. Anyways, we want to only allow a return value of read_user_id_from_session_id to be used in read_user (since in our system, a user ID can only come from a session). We don't want to allow any value, reason being that it's probably a mistake. Imagine we did this (exhibit B):
user = read_user(session_id)
To a quick reader, it may appear correct. They'd probably think a select * from users where session_id = $1 is happening. However, this is actually treating a session_id as a user_id, and with our current type hints it passes despite causing unintended behavior at runtime. Instead, we can change the type hints to this:
UserID = NewType("UserID", str)
def read_user_id_from_session_id(session_id: str) -> Optional[UserID]:
...
def read_user(user_id: UserID) -> User:
...
Exhibit A expressed above would still work, because the flow of data is correct. But we'd have to turn Exhibit B into
read_user(UserID(session_id))
which quickly points out the problem of converting a session_id to a user_id without going through the required function.
In other programming languages with better type systems, this can be taken a step further. You can actually prohibit explicit construction like UserID(...) in all but one place, causing everyone to have to go through that one place in order to obtain a piece of data of that type. In Python, you can bypass the intended flow of data by explicitly doing YourNewType(...) anywhere. While NewType is beneficial over simply type aliases, it leaves this feature to be desired.
python - what's the advantage of `NewType` over `TypeAlias`? - Stack Overflow
Introduce new syntax to create `NewType` - Ideas - Discussions on Python.org
PEP 695 type aliases not suitable as replacement for typing.TypeAlias?
Python typing: alias vs NewType best practices? - Stack Overflow
Videos
I would like to find a way to clarify the expected type of a function argument, which of the type hints should I use?
In my case, the function should take two arguments, point_a and point_b, both of which should be of the type tuple[int, int], representing their coordinates (integer values).
I want my function to look like this:
def manhattan_dist(point_a: Coordinates, point_b: Coordinates) -> int:
ax, ay = point_a
bx, by = point_b
return abs(ax - bx) + abs(ay - by)Do I just create a new variable with Coordinates = tuple[int, int]? Or should I use TypeAlias?
I want to create a type for matrices. Should I use a type alias or typing.NewType?
Matrix = NewType('Matrix', List[List[int]]) or simply Matrix = List[List[int]] ?
Hi
Im reading through the typing documentation https://docs.python.org/3/library/typing.html and have a question that I cannot answer.
In the past when I wanted to use type aliases I would use code like Vector = list[float] (I think that I must have picked this up from a post on stack overflow or something).
However, in the document above it suggests using the code type Vector = list[float].
The difference between the two is that the data type is types.GenericAlias (the list[float]) for the first Vector and typing.TypeAliasType for the second Vector.
But besides that I am not really sure what is the difference between these two methods. Im not sure where the reason to use one over the other is. Im also not sure where the documntation is for the first example (maybe technically this is not a Type Alias).
Im not sure if anyone can help here?
The two concepts aren't related any more than any other type-related concepts.
In short, a TypeVar is a variable you can use in type signatures so you can refer to the same unspecified type more than once, while a NewType is used to tell the type checker that some values should be treated as their own type.
Type Variables
To simplify, type variables let you refer to the same type more than once without specifying exactly which type it is.
In a definition, a single type variable always takes the same value.
# (This code will type check, but it won't run.)
from typing import TypeVar, Generic
# Two type variables, named T and R
T = TypeVar('T')
R = TypeVar('R')
# Put in a list of Ts and get out one T
def get_one(x: list[T]) -> T: ...
# Put in a T and an R, get back an R and a T
def swap(x: T, y: R) -> tuple[R, T]:
return y, x
# A simple generic class that holds a value of type T
class ValueHolder(Generic[T]):
def __init__(self, value: T):
self.value = value
def get(self) -> T:
return self.value
x: ValueHolder[int] = ValueHolder(123)
y: ValueHolder[str] = ValueHolder('abc')
Without type variables, there wouldn't be a good way to declare the type of get_one or ValueHolder.get.
There are a few other options on TypeVar. You can restrict the possible values by passing in more types (e.g. TypeVar(name, int, str)), or you can give an upper bound so every value of the type variable must be a subtype of that type (e.g. TypeVar(name, bound=int)).
Additionally, you can decide whether a type variable is covariant, contravariant, or neither when you declare it. This essentially decides when subclasses or superclasses can be used in place of a generic type. PEP 484 describes these concepts in more detail, and refers to additional resources.
Addendum: Python 3.12 generic parameter lists
Starting in Python 3.12, the following syntax has been available to declare type variables.
def get_oneT -> T: ...
def swapT, R -> tuple[R, T]: ...
class ValueHolder[T]:
def __init__(self, value: T): ...
def get(self) -> T: ...
These declarations are equivalent to those above, but now the type variables are only defined in type signatures within their functions/classes, rather than being stored in regular Python variables. The Python 3.12 release notes contain a summary, as well as links to more-detailed documentation.
NewType
A NewType is for when you want to declare a distinct type without actually doing the work of creating a new type or worry about the overhead of creating new class instances.
In the type checker, NewType('Name', int) creates a subclass of int named "Name."
At runtime, NewType('Name', int) is not a class at all; it is actually the identity function, so x is NewType('Name', int)(x) is always true.
from typing import NewType
UserId = NewType('UserId', int)
def get_user(x: UserId): ...
get_user(UserId(123456)) # this is fine
get_user(123456) # that's an int, not a UserId
UserId(123456) + 123456 # fine, because UserId is a subclass of int
To the type checker, UserId looks something like this:
class UserId(int): pass
But at runtime, UserId is basically just this:
def UserId(x): return x
There's almost nothing more than that to a NewType at runtime. In Python 3.8.1, its implementation was almost exactly as follows:
def NewType(name, type_):
def identity(x):
return x
identity.__name__ = name
return identity
NewType() accepts an unique type parameter. To specialize the function for different types for static typing, you only need a TypeVar here.
Example: Read https://dev.to/decorator_factory/typevars-explained-hmo