For classes in general, you can access the __annotations__:
>>> class Foo:
... bar: int
... baz: str
...
>>> Foo.__annotations__
{'bar': <class 'int'>, 'baz': <class 'str'>}
This returns a dict mapping attribute name to annotation.
However, dataclasses use dataclasses.Field objects to encapsulate a lot of this information. You can use dataclasses.fields on an instance or on the class:
>>> import dataclasses
>>> @dataclasses.dataclass
... class Foo:
... bar: int
... baz: str
...
>>> dataclasses.fields(Foo)
(Field(name='bar',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), Field(name='baz',type=<class 'str'>,default=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD))
NOTE:
Starting in Python 3.7, the evaluation of annotations can be postponed:
>>> from __future__ import annotations
>>> class Foo:
... bar: int
... baz: str
...
>>> Foo.__annotations__
{'bar': 'int', 'baz': 'str'}
note, the annotation is kept as a string, this also affects dataclasses as well:
>>> @dataclasses.dataclass
... class Foo:
... bar: int
... baz: str
...
>>> dataclasses.fields(Foo)
(Field(name='bar',type='int',default=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), Field(name='baz',type='str',default=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD))
So, just be aware, since this will become the standard behavior, code you write should probably use the __future__ import and work under that assumption, because in Python 3.10, this will become the standard behavior.
The motivation behind this behavior is that the following currently raises an error:
>>> class Node:
... def foo(self) -> Node:
... return Node()
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in Node
NameError: name 'Node' is not defined
But with the new behavior:
>>> from __future__ import annotations
>>> class Node:
... def foo(self) -> Node:
... return Node()
...
>>>
One way to handle this is to use the typing.get_type_hints, which I believe just basically eval's the type hints:
>>> import typing
>>> typing.get_type_hints(Node.foo)
{'return': <class '__main__.Node'>}
>>> class Foo:
... bar: int
... baz: str
...
>>> Foo.__annotations__
{'bar': 'int', 'baz': 'str'}
>>> import typing
>>> typing.get_type_hints(Foo)
{'bar': <class 'int'>, 'baz': <class 'str'>}
Not sure how reliable this function is, but basically, it handles getting the appropriate globals and locals of where the class was defined. So, consider:
(py38) juanarrivillaga@Juan-Arrivillaga-MacBook-Pro ~ % cat test.py
from __future__ import annotations
import typing
class Node:
next: Node
(py38) juanarrivillaga@Juan-Arrivillaga-MacBook-Pro ~ % python
Python 3.8.5 (default, Sep 4 2020, 02:22:02)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import test
>>> test.Node
<class 'test.Node'>
>>> import typing
>>> typing.get_type_hints(test.Node)
{'next': <class 'test.Node'>}
Naively, you might try something like:
>>> test.Node.__annotations__
{'next': 'Node'}
>>> eval(test.Node.__annotations__['next'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in <module>
NameError: name 'Node' is not defined
You could hack together something like:
>>> eval(test.Node.__annotations__['next'], vars(test))
<class 'test.Node'>
But it can get tricky
Answer from juanpa.arrivillaga on Stack OverflowFor classes in general, you can access the __annotations__:
>>> class Foo:
... bar: int
... baz: str
...
>>> Foo.__annotations__
{'bar': <class 'int'>, 'baz': <class 'str'>}
This returns a dict mapping attribute name to annotation.
However, dataclasses use dataclasses.Field objects to encapsulate a lot of this information. You can use dataclasses.fields on an instance or on the class:
>>> import dataclasses
>>> @dataclasses.dataclass
... class Foo:
... bar: int
... baz: str
...
>>> dataclasses.fields(Foo)
(Field(name='bar',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), Field(name='baz',type=<class 'str'>,default=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD))
NOTE:
Starting in Python 3.7, the evaluation of annotations can be postponed:
>>> from __future__ import annotations
>>> class Foo:
... bar: int
... baz: str
...
>>> Foo.__annotations__
{'bar': 'int', 'baz': 'str'}
note, the annotation is kept as a string, this also affects dataclasses as well:
>>> @dataclasses.dataclass
... class Foo:
... bar: int
... baz: str
...
>>> dataclasses.fields(Foo)
(Field(name='bar',type='int',default=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), Field(name='baz',type='str',default=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f806369bc10>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD))
So, just be aware, since this will become the standard behavior, code you write should probably use the __future__ import and work under that assumption, because in Python 3.10, this will become the standard behavior.
The motivation behind this behavior is that the following currently raises an error:
>>> class Node:
... def foo(self) -> Node:
... return Node()
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in Node
NameError: name 'Node' is not defined
But with the new behavior:
>>> from __future__ import annotations
>>> class Node:
... def foo(self) -> Node:
... return Node()
...
>>>
One way to handle this is to use the typing.get_type_hints, which I believe just basically eval's the type hints:
>>> import typing
>>> typing.get_type_hints(Node.foo)
{'return': <class '__main__.Node'>}
>>> class Foo:
... bar: int
... baz: str
...
>>> Foo.__annotations__
{'bar': 'int', 'baz': 'str'}
>>> import typing
>>> typing.get_type_hints(Foo)
{'bar': <class 'int'>, 'baz': <class 'str'>}
Not sure how reliable this function is, but basically, it handles getting the appropriate globals and locals of where the class was defined. So, consider:
(py38) juanarrivillaga@Juan-Arrivillaga-MacBook-Pro ~ % cat test.py
from __future__ import annotations
import typing
class Node:
next: Node
(py38) juanarrivillaga@Juan-Arrivillaga-MacBook-Pro ~ % python
Python 3.8.5 (default, Sep 4 2020, 02:22:02)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import test
>>> test.Node
<class 'test.Node'>
>>> import typing
>>> typing.get_type_hints(test.Node)
{'next': <class 'test.Node'>}
Naively, you might try something like:
>>> test.Node.__annotations__
{'next': 'Node'}
>>> eval(test.Node.__annotations__['next'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in <module>
NameError: name 'Node' is not defined
You could hack together something like:
>>> eval(test.Node.__annotations__['next'], vars(test))
<class 'test.Node'>
But it can get tricky
Check this out:
from dataclasses import dataclass
@dataclass
class Point:
x: int
y: int
Point.__annotations__ returns {'x': <class 'int'>, 'y': <class 'int'>}.
If I've understood your question correctly, you can do something like this::
import json
import dataclasses
@dataclasses.dataclass
class mySubClass:
sub_item1: str
sub_item2: str
@dataclasses.dataclass
class myClass:
item1: str
item2: mySubClass
# We need a __post_init__ method here because otherwise
# item2 will contain a python dictionary, rather than
# an instance of mySubClass.
def __post_init__(self):
self.item2 = mySubClass(**self.item2)
sampleData = '''
{
"item1": "This is a test",
"item2": {
"sub_item1": "foo",
"sub_item2": "bar"
}
}
'''
myvar = myClass(**json.loads(sampleData))
myvar.item2.sub_item1 = 'modified'
print(json.dumps(dataclasses.asdict(myvar)))
Running this produces:
{"item1": "This is a test", "item2": {"sub_item1": "modified", "sub_item2": "bar"}}
As a side note, this all becomes easier if you use a more fully featured package like pydantic:
import json
from pydantic import BaseModel
class mySubClass(BaseModel):
sub_item1: str
sub_item2: str
class myClass(BaseModel):
item1: str
item2: mySubClass
sampleData = '''
{
"item1": "This is a test",
"item2": {
"sub_item1": "foo",
"sub_item2": "bar"
}
}
'''
myvar = myClass(**json.loads(sampleData))
myvar.item2.sub_item1 = 'modified'
print(myvar.json())
Without using any libraries other than the builtins:
import dataclasses
import json
@dataclasses.dataclass
class mySubClass:
sub_item1: str
sub_item2: str
@dataclasses.dataclass
class myClass:
item1: str
item2: mySubClass
@classmethod
def from_json(cls, string: str):
data: dict = json.loads(string)
if isinstance(data['item2'], dict):
data['item2'] = mySubClass(**data['item2'])
return cls(**data)
def json(self):
return json.dumps(self, default=lambda o: o.__dict__)
sampleData = '''
{
"item1": "This is a test",
"item2": {
"sub_item1": "foo",
"sub_item2": "bar"
}
}
'''
myvar = myClass.from_json(sampleData)
myvar.item2.sub_item1 = 'modified'
print(myvar.json())
Which becomes a bit easier, using a ser/de library like dataclass-wizard, or dataclasses-json:
import dataclasses
from dataclass_wizard import JSONWizard
@dataclasses.dataclass
class myClass(JSONWizard):
item1: str
item2: 'mySubClass'
# optional
@property
def json(self, indent=None):
return self.to_json(indent=indent)
@dataclasses.dataclass
class mySubClass:
sub_item1: str
sub_item2: str
sampleData = '''
{
"item1": "This is a test",
"item2": {
"sub_item1": "foo",
"sub_item2": "bar"
}
}
'''
c = myClass.from_json(sampleData)
print(c.json)
Disclaimer: I am the creator and maintenor of this library.
Videos
I simply added a ‘to_dict’ class method which calls ‘dataclasses.asdict(self)’ to handle this. Regardless of workarounds, shouldn’t dataclasses in python be JSON serializable out of the box given their purpose as a data object?
Am I misunderstanding something here? What would be other ways of doing this?
If I have the below code, how can I write a print statement to show the values of self.1, self.2, and self.3?
class Name:
def __init__(self):
self.1 = "a"
self.2 = "b"
self.3 = "c"Much like you can add support to the JSON encoder for datetime objects or Decimals, you can also provide a custom encoder subclass to serialize dataclasses:
import dataclasses, json
class EnhancedJSONEncoder(json.JSONEncoder):
def default(self, o):
if dataclasses.is_dataclass(o):
return dataclasses.asdict(o)
return super().default(o)
json.dumps(foo, cls=EnhancedJSONEncoder)
Can't you just use the dataclasses.asdict() function to convert the dataclass
to a dict? Something like:
>>> @dataclass
... class Foo:
... a: int
... b: int
...
>>> x = Foo(1,2)
>>> json.dumps(dataclasses.asdict(x))
'{"a": 1, "b": 2}'
The dataclasses module doesn't provide built-in support for this use case, i.e. loading YAML data to a nested class model.
In such a scenario, I would turn to a ser/de library such as dataclass-wizard, which provides OOTB support for (de)serializing YAML data, via the PyYAML library.
Disclaimer: I am the creator and maintener of this library.
Step 1: Generate a Dataclass Model
Note: I will likely need to make this step easier for generating a dataclass model for YAML data. Perhaps worth creating an issue to look into as time allows. Ideally, usage is from the CLI, however since we have YAML data, it is tricky, because the utility tool expects JSON.
So easiest to do this in Python itself, for now:
from json import dumps
# pip install PyYAML dataclass-wizard
from yaml import safe_load
from dataclass_wizard.wizard_cli import PyCodeGenerator
yaml_string = """
account: 12345
clusters:
- name: cluster_1
endpoint: https://cluster_2
certificate: abcdef
- name: cluster_1
endpoint: https://cluster_2
certificate: abcdef
"""
py_code = PyCodeGenerator(experimental=True, file_contents=dumps(safe_load(yaml_string))).py_code
print(py_code)
Prints:
from __future__ import annotations
from dataclasses import dataclass
from dataclass_wizard import JSONWizard
@dataclass
class Data(JSONWizard):
"""
Data dataclass
"""
account: int
clusters: list[Cluster]
@dataclass
class Cluster:
"""
Cluster dataclass
"""
name: str
endpoint: str
certificate: str
Step 2: Use Generated Dataclass Model, alongside YAMLWizard
Contents of my_file.yml:
account: 12345
clusters:
- name: cluster_1
endpoint: https://cluster_5
certificate: abcdef
- name: cluster_2
endpoint: https://cluster_7
certificate: xyz
Python code:
from __future__ import annotations
from dataclasses import dataclass
from pprint import pprint
from dataclass_wizard import YAMLWizard
@dataclass
class Data(YAMLWizard):
account: int
clusters: list[Cluster]
@dataclass
class Cluster:
name: str
endpoint: str
certificate: str
data = Data.from_yaml_file('./my_file.yml')
pprint(data)
for c in data.clusters:
print(c.endpoint)
Result:
Data(account=12345,
clusters=[Cluster(name='cluster_1',
endpoint='https://cluster_5',
certificate='abcdef'),
Cluster(name='cluster_2',
endpoint='https://cluster_7',
certificate='xyz')])
https://cluster_5
https://cluster_7
As Barmar points out in a comment, even though you have correctly typed the _clusters key in your AWSInfo dataclass...
@dataclass
class AWSInfo:
_account: int
_clusters: list[ClusterInfo]
...the dataclasses module isn't smart enough to automatically convert the members of the clusters list in in your input data into the appropriate data type. If you use a more comprehensive data model library like Pydantic, things will work like you expect:
import yaml
from pydantic import BaseModel
class ClusterInfo(BaseModel):
name: str
endpoint: str
certificate: str
class AWSInfo(BaseModel):
account: int
clusters: list[ClusterInfo]
with open('clusters.yml', 'r') as fd:
clusters = yaml.safe_load(fd)
a = AWSInfo(**clusters)
print(a.account) #prints 12345
print(a.clusters) #prints the dict of both clusters
print(a.clusters[0]) #prints the dict of the first cluster
#These prints fails with AttributeError: 'dict' object has no attribute '_endpoint'
print(a.clusters[0].endpoint)
for c in a.clusters:
print(c.endpoint)
Running the above code (with your sample input) produces:
12345
[ClusterInfo(name='cluster_1', endpoint='https://cluster_2', certificate='abcdef'), ClusterInfo(name='cluster_1', endpoint='https://cluster_2', certificate='abcdef')]
name='cluster_1' endpoint='https://cluster_2' certificate='abcdef'
https://cluster_2
https://cluster_2
https://cluster_2
This example shows only a name, type and value, however, __dataclass_fields__ is a dict of Field objects, each containing information such as name, type, default value, etc.
Using dataclasses.fields()
Using dataclasses.fields() you can access fields you defined in your dataclass.
fields = dataclasses.fields(dataclass_instance)
Using inspect.getmembers()
Using inspect.getmembers() you can access all fields in your dataclass.
members = inspect.getmembers(type(dataclass_instance))
fields = list(dict(members)['__dataclass_fields__'].values())
Complete code solution
import dataclasses
import inspect
@dataclasses.dataclass
class Test:
a: str = "a value"
b: str = "b value"
def print_data_class(dataclass_instance):
# option 1: fields
fields = dataclasses.fields(dataclass_instance)
# option 2: inspect
members = inspect.getmembers(type(dataclass_instance))
fields = list(dict(members)['__dataclass_fields__'].values())
for v in fields:
print(f'{v.name}: ({v.type.__name__}) = {getattr(dataclass_instance, v.name)}')
print_data_class(Test())
# a: (str) = a value
# b: (str) = b value
print_data_class(Test(a="1", b="2"))
# a: (str) = 1
# b: (str) = 2
Also, you can use __annotations__, well, because data fields are always annotated. This is the essense of dataclasses usage.
It works with classes
fields = list(Test.__annotations__)
and with instances
fields = list(test.__annotations__)
There should be noted that it doesn't work with dataclass subclasses. Obviously. However, simplicity gives you fields names directly, without extra code for extraction from Field objects.