Data classes are just regular classes that are geared towards storing state, rather than containing a lot of logic. Every time you create a class that mostly consists of attributes, you make a data class.
What the dataclasses module does is to make it easier to create data classes. It takes care of a lot of boilerplate for you.
This is especially useful when your data class must be hashable; because this requires a __hash__ method as well as an __eq__ method. If you add a custom __repr__ method for ease of debugging, that can become quite verbose:
class InventoryItem:
'''Class for keeping track of an item in inventory.'''
name: str
unit_price: float
quantity_on_hand: int = 0
def __init__(
self,
name: str,
unit_price: float,
quantity_on_hand: int = 0
) -> None:
self.name = name
self.unit_price = unit_price
self.quantity_on_hand = quantity_on_hand
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
def __repr__(self) -> str:
return (
'InventoryItem('
f'name={self.name!r}, unit_price={self.unit_price!r}, '
f'quantity_on_hand={self.quantity_on_hand!r})'
)
def __hash__(self) -> int:
return hash((self.name, self.unit_price, self.quantity_on_hand))
def __eq__(self, other) -> bool:
if not isinstance(other, InventoryItem):
return NotImplemented
return (
(self.name, self.unit_price, self.quantity_on_hand) ==
(other.name, other.unit_price, other.quantity_on_hand))
With dataclasses you can reduce it to:
from dataclasses import dataclass
@dataclass(unsafe_hash=True)
class InventoryItem:
'''Class for keeping track of an item in inventory.'''
name: str
unit_price: float
quantity_on_hand: int = 0
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
(Example based on the PEP example).
The same class decorator can also generate comparison methods (__lt__, __gt__, etc.) and handle immutability.
namedtuple classes are also data classes, but are immutable by default (as well as being sequences). dataclasses are much more flexible in this regard, and can easily be structured such that they can fill the same role as a namedtuple class.
The PEP was inspired by the attrs project, which can do even more (including slots, validators, converters, metadata, etc.).
If you want to see some examples, I recently used dataclasses for several of my Advent of Code solutions, see the solutions for day 7, day 8, day 11 and day 20.
If you want to use dataclasses module in Python versions < 3.7, then you could install the backported module (requires 3.6) or use the attrs project mentioned above.
Why you should use Data Classes in Python
My wrapper to @dataclass that requires/enforces attribute declarations and constness - Ideas - Discussions on Python.org
Dataclass - what is it [for]?
Any reason not to use dataclasses everywhere?
Videos
Data classes are just regular classes that are geared towards storing state, rather than containing a lot of logic. Every time you create a class that mostly consists of attributes, you make a data class.
What the dataclasses module does is to make it easier to create data classes. It takes care of a lot of boilerplate for you.
This is especially useful when your data class must be hashable; because this requires a __hash__ method as well as an __eq__ method. If you add a custom __repr__ method for ease of debugging, that can become quite verbose:
class InventoryItem:
'''Class for keeping track of an item in inventory.'''
name: str
unit_price: float
quantity_on_hand: int = 0
def __init__(
self,
name: str,
unit_price: float,
quantity_on_hand: int = 0
) -> None:
self.name = name
self.unit_price = unit_price
self.quantity_on_hand = quantity_on_hand
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
def __repr__(self) -> str:
return (
'InventoryItem('
f'name={self.name!r}, unit_price={self.unit_price!r}, '
f'quantity_on_hand={self.quantity_on_hand!r})'
)
def __hash__(self) -> int:
return hash((self.name, self.unit_price, self.quantity_on_hand))
def __eq__(self, other) -> bool:
if not isinstance(other, InventoryItem):
return NotImplemented
return (
(self.name, self.unit_price, self.quantity_on_hand) ==
(other.name, other.unit_price, other.quantity_on_hand))
With dataclasses you can reduce it to:
from dataclasses import dataclass
@dataclass(unsafe_hash=True)
class InventoryItem:
'''Class for keeping track of an item in inventory.'''
name: str
unit_price: float
quantity_on_hand: int = 0
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
(Example based on the PEP example).
The same class decorator can also generate comparison methods (__lt__, __gt__, etc.) and handle immutability.
namedtuple classes are also data classes, but are immutable by default (as well as being sequences). dataclasses are much more flexible in this regard, and can easily be structured such that they can fill the same role as a namedtuple class.
The PEP was inspired by the attrs project, which can do even more (including slots, validators, converters, metadata, etc.).
If you want to see some examples, I recently used dataclasses for several of my Advent of Code solutions, see the solutions for day 7, day 8, day 11 and day 20.
If you want to use dataclasses module in Python versions < 3.7, then you could install the backported module (requires 3.6) or use the attrs project mentioned above.
Overview
The question has been addressed. However, this answer adds some practical examples to aid in the basic understanding of dataclasses.
What exactly are python data classes and when is it best to use them?
- code generators: generate boilerplate code; you can choose to implement special methods in a regular class or have a dataclass implement them automatically.
- data containers: structures that hold data (e.g. tuples and dicts), often with dotted, attribute access such as classes,
namedtupleand others.
"mutable namedtuples with default[s]"
Here is what the latter phrase means:
- mutable: by default, dataclass attributes can be reassigned. You can optionally make them immutable (see Examples below).
- namedtuple: you have dotted, attribute access like a
namedtupleor a regular class. - default: you can assign default values to attributes.
Compared to common classes, you primarily save on typing boilerplate code.
Features
This is an overview of dataclass features (TL;DR? See the Summary Table in the next section).
What you get
Here are features you get by default from dataclasses.
Attributes + Representation + Comparison
import dataclasses
@dataclasses.dataclass
#@dataclasses.dataclass() # alternative
class Color:
r : int = 0
g : int = 0
b : int = 0
These defaults are provided by automatically setting the following keywords to True:
@dataclasses.dataclass(init=True, repr=True, eq=True)
What you can turn on
Additional features are available if the appropriate keywords are set to True.
Order
@dataclasses.dataclass(order=True)
class Color:
r : int = 0
g : int = 0
b : int = 0
The ordering methods are now implemented (overloading operators: < > <= >=), similarly to functools.total_ordering with stronger equality tests.
Hashable, Mutable
@dataclasses.dataclass(unsafe_hash=True) # override base `__hash__`
class Color:
...
Although the object is potentially mutable (possibly undesired), a hash is implemented.
Hashable, Immutable
@dataclasses.dataclass(frozen=True) # `eq=True` (default) to be immutable
class Color:
...
A hash is now implemented and changing the object or assigning to attributes is disallowed.
Overall, the object is hashable if either unsafe_hash=True or frozen=True.
See also the original hashing logic table with more details.
Optimization
@dataclasses.dataclass(slots=True) # py310+
class SlottedColor:
#__slots__ = ["r", "b", "g"] # alternative
r : int
g : int
b : int
The object size is now reduced:
>>> imp sys
>>> sys.getsizeof(Color)
1056
>>> sys.getsizeof(SlottedColor)
888
slots=True was added in Python 3.10. (Thanks @ajskateboarder).
In some circumstances, slots=True/__slots__ also improves the speed of creating instances and accessing attributes. Also, slots do not allow default assignments; otherwise, a ValueError is raised. If __slot__ already exists, slots=True will cause a TypeError.
See more on slots in this blog post.
See more on arguments added in Python 3.10+: match_args, kw_only, slots, weakref_slot.
What you don't get
To get the following features, special methods must be manually implemented:
Unpacking
@dataclasses.dataclass
class Color:
r : int = 0
g : int = 0
b : int = 0
def __iter__(self):
yield from dataclasses.astuple(self)
Summary Table
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
| Feature | Keyword | Example | Implement in a Class |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
| Attributes | init | Color().r -> 0 | __init__ |
| Representation | repr | Color() -> Color(r=0, g=0, b=0) | __repr__ |
| Comparision* | eq | Color() == Color(0, 0, 0) -> True | __eq__ |
| | | | |
| Order | order | sorted([Color(0, 50, 0), Color()]) -> ... | __lt__, __le__, __gt__, __ge__ |
| Hashable | unsafe_hash/frozen | {Color(), {Color()}} -> {Color(r=0, g=0, b=0)} | __hash__ |
| Immutable | frozen + eq | Color().r = 10 -> TypeError | __setattr__, __delattr__ |
| Optimization | slots | sys.getsizeof(SlottedColor) -> 888 | __slots__ |
| | | | |
| Unpacking+ | - | r, g, b = Color() | __iter__ |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
* __ne__ is not needed and thus not implemented.
+These methods are not automatically generated and require manual implementation in a dataclass.
Additional features
Post-initialization
@dataclasses.dataclass
class RGBA:
r : int = 0
g : int = 0
b : int = 0
a : float = 1.0
def __post_init__(self):
self.a : int = int(self.a * 255)
RGBA(127, 0, 255, 0.5)
# RGBA(r=127, g=0, b=255, a=127)
Inheritance
@dataclasses.dataclass
class RGBA(Color):
a : int = 0
Conversions
Convert a dataclass to a tuple or a dict, recursively:
>>> dataclasses.astuple(Color(128, 0, 255))
(128, 0, 255)
>>> dataclasses.asdict(Color(128, 0, 255))
{'r': 128, 'g': 0, 'b': 255}
Limitations
- Lacks mechanisms to handle starred arguments
- Working with nested dataclasses can be complicated
References
- R. Hettinger's talk on Dataclasses: The code generator to end all code generators
- T. Hunner's talk on Easier Classes: Python Classes Without All the Cruft
- Python's documentation on hashing details
- Real Python's guide on The Ultimate Guide to Data Classes in Python 3.7
- A. Shaw's blog post on A brief tour of Python 3.7 data classes
- E. Smith's github repository on dataclasses
I've been learning OOP but the dataclass decorator's use case sort of escapes me.
I understand classes and methods superficially but I quite don't understand how it differs from just creating a regular class. What's the advantage of using a dataclass?
How does it work and what is it for? (ELI5, please!)
My use case would be a collection of constants. I was wondering if I should be using dataclasses...
class MyCreatures:
T_REX_CALLNAME = "t-rex"
T_REX_RESPONSE = "The awesome king of Dinosaurs!"
PTERODACTYL_CALLNAME = "pterodactyl"
PTERODACTYL_RESPONSE = "The flying Menace!"
...
def check_dino():
name = input("Please give a dinosaur: ")
if name == MyCreature.T_REX_CALLNAME:
print(MyCreatures.T_REX_RESPONSE)
if name = ...Halp?