This:
def __post_init__(self):
super(NamedObj, self).__post_init__()
super(NumberedObj, self).__post_init__()
print("NamedAndNumbered __post_init__")
doesn't do what you think it does. super(cls, obj) will return a proxy to the class after cls in type(obj).__mro__ - so, in your case, to object. And the whole point of cooperative super() calls is to avoid having to explicitely call each of the parents.
The way cooperative super() calls are intended to work is, well, by being "cooperative" - IOW, everyone in the mro is supposed to relay the call to the next class (actually, the super name is a rather sad choice, as it's not about calling "the super class", but about "calling the next class in the mro").
IOW, you want each of your "composable" dataclasses (which are not mixins - mixins only have behaviour) to relay the call, so you can compose them in any order. A first naive implementation would look like:
@dataclass
class NamedObj:
name: str
def __post_init__(self):
super().__post_init__()
print("NamedObj __post_init__")
self.name = "Name: " + self.name
@dataclass
class NumberedObj:
number: int = 0
def __post_init__(self):
super().__post_init__()
print("NumberedObj __post_init__")
self.number += 1
@dataclass
class NamedAndNumbered(NumberedObj, NamedObj):
def __post_init__(self):
super().__post_init__()
print("NamedAndNumbered __post_init__")
BUT this doesn't work, since for the last class in the mro (here NamedObj), the next class in the mro is the builtin object class, which doesn't have a __post_init__ method. The solution is simple: just add a base class that defines this method as a noop, and make all your composable dataclasses inherit from it:
class Base(object):
def __post_init__(self):
# just intercept the __post_init__ calls so they
# aren't relayed to `object`
pass
@dataclass
class NamedObj(Base):
name: str
def __post_init__(self):
super().__post_init__()
print("NamedObj __post_init__")
self.name = "Name: " + self.name
@dataclass
class NumberedObj(Base):
number: int = 0
def __post_init__(self):
super().__post_init__()
print("NumberedObj __post_init__")
self.number += 1
@dataclass
class NamedAndNumbered(NumberedObj, NamedObj):
def __post_init__(self):
super().__post_init__()
print("NamedAndNumbered __post_init__")
Answer from bruno desthuilliers on Stack OverflowThis:
def __post_init__(self):
super(NamedObj, self).__post_init__()
super(NumberedObj, self).__post_init__()
print("NamedAndNumbered __post_init__")
doesn't do what you think it does. super(cls, obj) will return a proxy to the class after cls in type(obj).__mro__ - so, in your case, to object. And the whole point of cooperative super() calls is to avoid having to explicitely call each of the parents.
The way cooperative super() calls are intended to work is, well, by being "cooperative" - IOW, everyone in the mro is supposed to relay the call to the next class (actually, the super name is a rather sad choice, as it's not about calling "the super class", but about "calling the next class in the mro").
IOW, you want each of your "composable" dataclasses (which are not mixins - mixins only have behaviour) to relay the call, so you can compose them in any order. A first naive implementation would look like:
@dataclass
class NamedObj:
name: str
def __post_init__(self):
super().__post_init__()
print("NamedObj __post_init__")
self.name = "Name: " + self.name
@dataclass
class NumberedObj:
number: int = 0
def __post_init__(self):
super().__post_init__()
print("NumberedObj __post_init__")
self.number += 1
@dataclass
class NamedAndNumbered(NumberedObj, NamedObj):
def __post_init__(self):
super().__post_init__()
print("NamedAndNumbered __post_init__")
BUT this doesn't work, since for the last class in the mro (here NamedObj), the next class in the mro is the builtin object class, which doesn't have a __post_init__ method. The solution is simple: just add a base class that defines this method as a noop, and make all your composable dataclasses inherit from it:
class Base(object):
def __post_init__(self):
# just intercept the __post_init__ calls so they
# aren't relayed to `object`
pass
@dataclass
class NamedObj(Base):
name: str
def __post_init__(self):
super().__post_init__()
print("NamedObj __post_init__")
self.name = "Name: " + self.name
@dataclass
class NumberedObj(Base):
number: int = 0
def __post_init__(self):
super().__post_init__()
print("NumberedObj __post_init__")
self.number += 1
@dataclass
class NamedAndNumbered(NumberedObj, NamedObj):
def __post_init__(self):
super().__post_init__()
print("NamedAndNumbered __post_init__")
The problem (most probably) isn't related to dataclasses. The problem is in Python's method resolution. Calling method on super() invokes the first found method from parent class in the MRO chain. So to make it work you need to call the methods of parent classes manually:
@dataclass
class NamedAndNumbered(NumberedObj, NamedObj):
def __post_init__(self):
NamedObj.__post_init__(self)
NumberedObj.__post_init__(self)
print("NamedAndNumbered __post_init__")
Another approach (if you really like super()) could be to continue the MRO chain by calling super() in all parent classes (but it needs to have a __post_init__ in the chain):
@dataclass
class MixinObj:
def __post_init__(self):
pass
@dataclass
class NamedObj(MixinObj):
name: str
def __post_init__(self):
super().__post_init__()
print("NamedObj __post_init__")
self.name = "Name: " + self.name
@dataclass
class NumberedObj(MixinObj):
number: int = 0
def __post_init__(self):
super().__post_init__()
print("NumberedObj __post_init__")
self.number += 1
@dataclass
class NamedAndNumbered(NumberedObj, NamedObj):
def __post_init__(self):
super().__post_init__()
print("NamedAndNumbered __post_init__")
In both approaches:
>>> nandn = NamedAndNumbered('n_and_n')
NamedObj __post_init__
NumberedObj __post_init__
NamedAndNumbered __post_init__
>>> print(nandn.name)
Name: n_and_n
>>> print(nandn.number)
1
Videos
I have a class Animal and Dog which inherits Animal. Why is it that I get an error if I try to give my Dog class a breed field?
TypeError: non-default argument 'breed' follows default argument
This is my code
from dataclasses import dataclass
@dataclass
class Animal:
species: str
arms: int
legs: int
@dataclass
class Dog(Animal):
breed: str
species: str = "Dog"
arms: int = 0
legs: int = 4
if __name__ == '__main__':
jake = Dog(breed="Bulldog")
print(jake)
I did find that if I add a breed field to Animal I wouldn't get the error.
Once I got around to finding out how to do this with Pydantic, I found it is exceedingly easy, at least as long as you are dealing with such a small difference between the two structures. You'll need to install pydantic to try the following working example:
from pydantic import BaseModel, Field
class Animal(BaseModel):
genus: str = Field(alias="breed")
color: str
name: str
class Config:
allow_population_by_field_name = True
spot = Animal(genus="retriever", color="brown", name="spot")
json_v1 = spot.json(by_alias=True)
json_v2 = spot.json()
print("v1 out:", json_v1)
print("v2 out:", json_v2)
animal_v1 = Animal.parse_raw(json_v1)
animal_v2 = Animal.parse_raw(json_v2)
print("v1 in:", animal_v1)
print("v2 in:", animal_v2)
Output:
v1 out: {"breed": "retriever", "color": "brown", "name": "spot"}
v2 out: {"genus": "retriever", "color": "brown", "name": "spot"}
v1 in: genus='retriever' color='brown' name='spot'
v2 in: genus='retriever' color='brown' name='spot'
If you aren't familiar with Pydantic, it builds upon dataclasses to add a lot of really useful features. Every time I dig into it, I find more interesting things. I even used it recently to help me generate an antiquated wire format that you can't get libraries for, at least not for free. I've wrestled with serialization for decades and it's a really good tool. Other really good frameworks like FastAPI are built upon it as well.
2 is by far the simplest. However, that just perpetuates the problem.
How is this used? If it's possible to get to a point where you can use this without knowing which version it is then that's work worth doing. Any work you do along the lines of 1 should be aimed at that goal. What you build shouldn't just emerge from looking at the data. Consider what's going to be done with this.
You said the responses aren't going to change. Which makes me think these are immutable. So we don't have to worry about saving updates. In that case I'd lean towards your own data class that can only be populated with what you need. Write methods to populate it from either version. Now you can call things what you want to call them.
The way dataclasses combines attributes prevents you from being able to use attributes with defaults in a base class and then use attributes without a default (positional attributes) in a subclass.
That's because the attributes are combined by starting from the bottom of the MRO, and building up an ordered list of the attributes in first-seen order; overrides are kept in their original location. So Parent starts out with ['name', 'age', 'ugly'], where ugly has a default, and then Child adds ['school'] to the end of that list (with ugly already in the list). This means you end up with ['name', 'age', 'ugly', 'school'] and because school doesn't have a default, this results in an invalid argument listing for __init__.
This is documented in PEP-557 Dataclasses, under inheritance:
When the Data Class is being created by the
@dataclassdecorator, it looks through all of the class's base classes in reverse MRO (that is, starting atobject) and, for each Data Class that it finds, adds the fields from that base class to an ordered mapping of fields. After all of the base class fields are added, it adds its own fields to the ordered mapping. All of the generated methods will use this combined, calculated ordered mapping of fields. Because the fields are in insertion order, derived classes override base classes.
and under Specification:
TypeErrorwill be raised if a field without a default value follows a field with a default value. This is true either when this occurs in a single class, or as a result of class inheritance.
You do have a few options here to avoid this issue.
The first option is to use separate base classes to force fields with defaults into a later position in the MRO order. At all cost, avoid setting fields directly on classes that are to be used as base classes, such as Parent.
The following class hierarchy works:
# base classes with fields; fields without defaults separate from fields with.
@dataclass
class _ParentBase:
name: str
age: int
@dataclass
class _ParentDefaultsBase:
ugly: bool = False
@dataclass
class _ChildBase(_ParentBase):
school: str
@dataclass
class _ChildDefaultsBase(_ParentDefaultsBase):
ugly: bool = True
# public classes, deriving from base-with, base-without field classes
# subclasses of public classes should put the public base class up front.
@dataclass
class Parent(_ParentDefaultsBase, _ParentBase):
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f"The Name is {self.name} and {self.name} is {self.age} year old")
@dataclass
class Child(_ChildDefaultsBase, Parent, _ChildBase):
pass
By pulling out fields into separate base classes with fields without defaults and fields with defaults, and a carefully selected inheritance order, you can produce an MRO that puts all fields without defaults before those with defaults. The reversed MRO (ignoring object) for Child is:
_ParentBase
_ChildBase
_ParentDefaultsBase
Parent
_ChildDefaultsBase
Note that while Parent doesn't set any new fields, it does inherit the fields from _ParentDefaultsBase and should not end up 'last' in the field listing order; the above order puts _ChildDefaultsBase last so its fields 'win'. The dataclass rules are also satisfied; the classes with fields without defaults (_ParentBase and _ChildBase) precede the classes with fields with defaults (_ParentDefaultsBase and _ChildDefaultsBase).
The result is Parent and Child classes with a sane field older, while Child is still a subclass of Parent:
>>> from inspect import signature
>>> signature(Parent)
<Signature (name: str, age: int, ugly: bool = False) -> None>
>>> signature(Child)
<Signature (name: str, age: int, school: str, ugly: bool = True) -> None>
>>> issubclass(Child, Parent)
True
and so you can create instances of both classes:
>>> jack = Parent('jack snr', 32, ugly=True)
>>> jack_son = Child('jack jnr', 12, school='havard', ugly=True)
>>> jack
Parent(name='jack snr', age=32, ugly=True)
>>> jack_son
Child(name='jack jnr', age=12, school='havard', ugly=True)
Another option is to only use fields with defaults; you can still make in an error to not supply a school value, by raising one in __post_init__:
_no_default = object()
@dataclass
class Child(Parent):
school: str = _no_default
ugly: bool = True
def __post_init__(self):
if self.school is _no_default:
raise TypeError("__init__ missing 1 required argument: 'school'")
but this does alter the field order; school ends up after ugly:
<Signature (name: str, age: int, ugly: bool = True, school: str = <object object at 0x1101d1210>) -> None>
and a type hint checker will complain about _no_default not being a string.
You can also use the attrs project, which was the project that inspired dataclasses. It uses a different inheritance merging strategy; it pulls overridden fields in a subclass to the end of the fields list, so ['name', 'age', 'ugly'] in the Parent class becomes ['name', 'age', 'school', 'ugly'] in the Child class; by overriding the field with a default, attrs allows the override without needing to do a MRO dance.
attrs supports defining fields without type hints, but lets stick to the supported type hinting mode by setting auto_attribs=True:
import attr
@attr.s(auto_attribs=True)
class Parent:
name: str
age: int
ugly: bool = False
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f"The Name is {self.name} and {self.name} is {self.age} year old")
@attr.s(auto_attribs=True)
class Child(Parent):
school: str
ugly: bool = True
Note that with Python 3.10, it is now possible to do it natively with dataclasses.
Dataclasses 3.10 added the kw_only attribute (similar to attrs).
It allows you to specify which fields are keyword_only, thus will be set at the end of the init, not causing an inheritance problem.
Taking directly from Eric Smith's blog post on the subject:
There are two reasons people [were asking for] this feature:
- When a dataclass has many fields, specifying them by position can become unreadable. It also requires that for backward compatibility, all new fields are added to the end of the dataclass. This isn't always desirable.
- When a dataclass inherits from another dataclass, and the base class has fields with default values, then all of the fields in the derived class must also have defaults.
What follows is the simplest way to do it with this new argument, but there are multiple ways you can use it to use inheritance with default values in the parent class:
from dataclasses import dataclass
@dataclass(kw_only=True)
class Parent:
name: str
age: int
ugly: bool = False
@dataclass(kw_only=True)
class Child(Parent):
school: str
ch = Child(name="Kevin", age=17, school="42")
print(ch.ugly)
Take a look at the blogpost linked above for a more thorough explanation of kw_only.
Cheers !
PS: As it is fairly new, note that your IDE might still raise a possible error, but it works at runtime
» pip install dataclassy
I would argue that #1 is the most correct method. For the example you showed, it appears to be irrelevant which method you use, but if you add a second variable, the differences become apparent. This is implicitly confirmed by the Inheritance section in the documentation.
@dataclass
class ParentClass:
a: str
b: str = "parent-b"
# This works smoothly
@dataclass
class ChildClass1(ParentClass):
a: str = "child-a"
# This works, but is a maintenance nightmare
@dataclass
class ChildClass2(ParentClass):
def __init__(self, a="child-a", b="parent-b"):
super().__init__(a, b)
# This works, but it changes the signature and only works if a is first
@dataclass
class ChildClass3(ParentClass):
def __init__(self, a="child-a", **kwargs):
super().__init__(a, **kwargs)
Right now, the dataclass decorator is adding default methods, including __init__ to your class. That means that if you wanted to use option #2 or #3, you would have to know and copy the function signature for all the parameters. At the same time, option #1 allows you to change the default for just a.
The other way to do what you're doing is to create a __post_init__ method for your child classes, which can then override the parent default value:
@dataclass
class ParentClass:
a: str = '' # Or pick some other universally acceptable marker
@dataclass
class ChildClass(ParentClass):
def __post_init__(self):
if self.a == '':
self.a = "child-a"
This is also needlessly complex for most scenarios, but may be useful for a more complex situation. Normally __post_init__ is meant to be used to initialize derived fields, as in the example in the linked documentation.
OK, so first thing, I came by here looking for just field-level inheritance on dataclasses (mostly to allow for isinstance() testing (yes,yes, with all caveats about that approach).
I then found a blog post Python dataclass inheritance, finally ! | by Anis Campos at Medium (wonders of wonders, an open Medium url too...) that relies on kw_only on the dataclass decorator, available from Python 3.10 on.
I got things to work for myself so here's a stab at a simple minimal proof of concept for what I think your use case looks like.
I am not sure it does exactly what you had in mind and this approach does constrain you to calling the constructors with keywords only, but that fine for me since I typically do things like inst = cls(**data) anyway.
(Note: you could also use Pydantic instead, which does support this straight out of the box)
@dataclass(kw_only=True) #
class ParentClass:
a_variable: str
def a_function(self) -> None:
return f" {self.a_variable=} on {type(self).__name__}"
# ONE
@dataclass(kw_only=True) #
class DataclassChild1(ParentClass):
a_variable: str = "default DataclassChild1"
@dataclass(kw_only=True) #
class DataclassChild2(ParentClass):
a_variable: str = "default DataclassChild2"
child2_var : str = "?"
def a_function(self) -> None:
return f" Dataclass2!{self.a_variable=} on {type(self).__name__}"
for inst in [
ParentClass(a_variable="parent"),
DataclassChild1(a_variable="child1!"),
DataclassChild2(a_variable="child2!"),
DataclassChild1(),
DataclassChild2(child2_var = "x"),
]:
print(inst)
print(inst.a_function())
and the output
test.<locals>.ParentClass(a_variable='parent')
self.a_variable='parent' on ParentClass
test.<locals>.DataclassChild1(a_variable='child1!')
self.a_variable='child1!' on DataclassChild1
test.<locals>.DataclassChild2(a_variable='child2!', child2_var='?')
Dataclass2!self.a_variable='child2!' on DataclassChild2
test.<locals>.DataclassChild1(a_variable='default DataclassChild1')
self.a_variable='default DataclassChild1' on DataclassChild1
test.<locals>.DataclassChild2(a_variable='default DataclassChild2', child2_var='x')
Dataclass2!self.a_variable='default DataclassChild2' on DataclassChild2