python dataclass ignore extra fields

How does one ignore extra arguments passed to a dataclass?

stackoverflow.com › questions › 54678337 › how-does-one-ignore-extra-arguments-passed-to-a-dataclass

Cleaning the argument list before passing it to the constructor is probably the best way to go about it. I'd advice against writing your own __init__ function though, since the dataclass' __init__ does a couple of other convenient things that you'll lose by overriding it.

Also, since the argument-cleaning logic is very tightly bound to the behavior of the class and returns an instance, it might make sense to put it into a classmethod:

from dataclasses import dataclass
import inspect

@dataclass
class Config:
    var_1: str
    var_2: str

    @classmethod
    def from_dict(cls, env):      
        return cls(**{
            k: v for k, v in env.items() 
            if k in inspect.signature(cls).parameters
        })


# usage:
params = {'var_1': 'a', 'var_2': 'b', 'var_3': 'c'}
c = Config.from_dict(params)   # works without raising a TypeError 
print(c)
# prints: Config(var_1='a', var_2='b')

Answer from Arne on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 54678337 › how-does-one-ignore-extra-arguments-passed-to-a-dataclass

python - How does one ignore extra arguments passed to a dataclass? - Stack Overflow

Top answer

1 of 7

Also, since the argument-cleaning logic is very tightly bound to the behavior of the class and returns an instance, it might make sense to put it into a classmethod:

from dataclasses import dataclass
import inspect

@dataclass
class Config:
    var_1: str
    var_2: str

    @classmethod
    def from_dict(cls, env):      
        return cls(**{
            k: v for k, v in env.items() 
            if k in inspect.signature(cls).parameters
        })


# usage:
params = {'var_1': 'a', 'var_2': 'b', 'var_3': 'c'}
c = Config.from_dict(params)   # works without raising a TypeError 
print(c)
# prints: Config(var_1='a', var_2='b')

2 of 7

I would just provide an explicit __init__ instead of using the autogenerated one. The body of the loop only sets recognized value, ignoring unexpected ones.

Note that this won't complain about missing values without defaults until later, though.

import dataclasses

@dataclasses.dataclass(init=False)
class Config:
    VAR_NAME_1: str
    VAR_NAME_2: str

    def __init__(self, **kwargs):
        names = set([f.name for f in dataclasses.fields(self)])
        for k, v in kwargs.items():
            if k in names:
                setattr(self, k, v)

Alternatively, you can pass a filtered environment to the default Config.__init__.

field_names = set(f.name for f in dataclasses.fields(Config))
c = Config(**{k:v for k,v in os.environ.items() if k in field_names})

reddit.com › r/learnpython › is it possible to ignore some fields when creating a python dataclass from a .csv?

r/learnpython on Reddit: is it possible to ignore some fields when creating a Python dataclass from a .csv?

September 4, 2024 -

Example code below. This works, but only if the .csv has only name, age, and city fields. If the .csv has more fields than the dataclass has defined, it throws an error like: TypeError: Person.__init__() got an unexpected keyword argument 'state'

Is there a way to have it ignore extra fields? I'm trying to avoid having to remove the fields first from the .csv, or iterate row by row, value by value...but obvs will do that if there's no 'smart' way to ignore. Like, wondering if we can pass desired fields to csv.DictReader? I see it has a fieldnames parameter, but the docs seem to suggest that is for generating a header row when one is missing (meaing, I'd have to pass a value for each column, so I'm back where I started)

Thanks!

import csv
from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int
    city: str

with open('people.csv', 'r') as f:
    reader = csv.DictReader(f)
    people = [Person(**row) for row in reader]

print(people)

Top answer

1 of 6

Is there a way to have it ignore extra fields? Sure, but you have to actually do it. Out of curiosity, did you find this bit of code, or did you create it yourself? The reason I ask is that using **row is a slightly more advanced Python expression and it would make sense why you're having this problem if you grabbed that part without understanding it. So first, let's break down what it's doing. The list comprehension is functionally similar to doing an operation on a loop. So what happens if we take this code and just look at row? people = [row for row in reader] You will probably see something like this if you run this code with just the three columns: [{'name': 'John Doe', 'age': '25', 'city': 'Houston'}, {'name': 'Beth Doe', 'age': '22', 'city': 'San Francisco'}] This is actually the same data the DictReader gives us. So what does the asterisk do? This is somewhat complicated, but it essentially breaks it into a tuple (single asterisk) or dictionary (double asterisk). For example, check this code based on the reader: for person in reader: print(*person) # Output name age city name age city What's happening here is that each key value is being printed out. Trying **person won't work, because it's trying to push a dictionary into print, but basically it creates arguments with key/value pairs, similar to using var=None in a parameter list for a function, where this would be `{"var": None} in dictionary format. Now that we know that, let's look back at your original code: people = [Person(**row) for row in reader] That Person(**row) is the cause of your error: DictReader is going to read every heading, so if you have 4 headings, such as state being including, it's equivalent to doing something like this: Person(name="name", age="age", city="city", state="state") The problem, of course, is that the Person dataclass doesn't have a state property, so this is undefined behavior. How can you fix this, then? Assuming you only want those three elements to represent a person, you'll need to skip the list comprehension method and do your loop manually, ignoring the fields you don't need. For example, something like this: people = [] for person in reader: new_person = Person( name = person["name"], age = int(person["age"]), city = person["city"], ) people.append(new_person) There are other ways to do this, of course, but this is the simplest. Essentially, you loop through each row, and create a new Person object with just the data from that row you want, and then you append that to a list. This will give you the same core data as your previous list comprehension but will ignore anything that isn't a row you want.

2 of 6

If your CSV has variable columns you either need different dataclasses for each kind or you shouldn't use them - rows with different columns are, notionally, different types of things when you're mapping rows to classes.

Videos