Use defaultdict
>>> from collections import defaultdict
>>> d = defaultdict(list) # create the dictionary, then populate it.
>>> d.update({"TIM":[['xx', 'yy'], ['aa', 'bb']], "SAM":[['yy', 'cc']]})
>>> d # see its what you wanted.
defaultdict(<type 'list'>, {'TIM': [['xx', 'yy'], ['aa', 'bb']], 'SAM': [['yy', 'cc']]})
>>> d["SAM"].append(['tt','uu']) # add more items to SAM
>>> d["KIM"].append(['ii','pp']) # create and add to KIM
>>> d # see its what you wanted.
defaultdict(<type 'list'>, {'TIM': [['xx', 'yy'], ['aa', 'bb']], 'KIM': [['ii', 'pp']], 'SAM': [['yy', 'cc'], ['tt', 'uu']]})
If you want the dictionary values to be sets, that is no problem:
>>> from collections import defaultdict
>>> d = defaultdict(set)
>>> d.update({"TIM":set([('xx', 'yy'), ('aa', 'bb')]), "SAM":set([('yy', 'cc')])})
>>> d["SAM"].add(('tt','uu'))
>>> d["KIM"].add(('ii','pp'))
>>> d
defaultdict(<type 'set'>, {'TIM': set([('xx', 'yy'), ('aa', 'bb')]), 'KIM': set([('ii', 'pp')]), 'SAM': set([('tt', 'uu'), ('yy', 'cc')])})
Answer from Inbar Rose on Stack OverflowUse defaultdict
>>> from collections import defaultdict
>>> d = defaultdict(list) # create the dictionary, then populate it.
>>> d.update({"TIM":[['xx', 'yy'], ['aa', 'bb']], "SAM":[['yy', 'cc']]})
>>> d # see its what you wanted.
defaultdict(<type 'list'>, {'TIM': [['xx', 'yy'], ['aa', 'bb']], 'SAM': [['yy', 'cc']]})
>>> d["SAM"].append(['tt','uu']) # add more items to SAM
>>> d["KIM"].append(['ii','pp']) # create and add to KIM
>>> d # see its what you wanted.
defaultdict(<type 'list'>, {'TIM': [['xx', 'yy'], ['aa', 'bb']], 'KIM': [['ii', 'pp']], 'SAM': [['yy', 'cc'], ['tt', 'uu']]})
If you want the dictionary values to be sets, that is no problem:
>>> from collections import defaultdict
>>> d = defaultdict(set)
>>> d.update({"TIM":set([('xx', 'yy'), ('aa', 'bb')]), "SAM":set([('yy', 'cc')])})
>>> d["SAM"].add(('tt','uu'))
>>> d["KIM"].add(('ii','pp'))
>>> d
defaultdict(<type 'set'>, {'TIM': set([('xx', 'yy'), ('aa', 'bb')]), 'KIM': set([('ii', 'pp')]), 'SAM': set([('tt', 'uu'), ('yy', 'cc')])})
you can use setdefault method:
>>> d = {'TIM':[['xx', 'yy'], ['aa', 'bb']], 'SAM':[['yy', 'cc']]}
>>> d.setdefault('SAM', []).append(['tt','uu'])
>>> d.setdefault('KIM', []).append(['ii','pp'])
>>> d
{'TIM': [['xx', 'yy'], ['aa', 'bb']], 'KIM': [['ii', 'pp']], 'SAM': [['yy', 'cc'], ['tt', 'uu']]}
I just learned about the update method while doing some work. I had been writing if statements to check if key is in .keys() for so long. Is there an equivalent for "add key: value if not already in dictionary, otherwise update the value"?
For example to replace:
if ind in collection.keys(): collection[ind].append(x) else: collection[ind] = [x]
Videos
Use defaultdict from the collections module.
>>> from collections import defaultdict
>>> dict1 = {1:'a',2:'b',3:'c'}
>>> dict2 = {1:'hello', 4:'four', 5:'five'}
>>> my_dict = defaultdict(list)
>>> for k in dict1:
... my_dict[k].append(dict1[k])
...
>>> for k in dict2:
... my_dict[k].append(dict2[k])
...
>>> my_dict[1]
['a', 'hello']
Another method without importing anything, just with the regular Python dictionary:
>>> dict1 = {1:'a',2:'b',3:'c'}
>>> dict2 = {1:'hello', 4:'four', 5:'five'}
>>> for k in dict2:
... dict1[k] = dict1.get(k,"") + dict2.get(k)
...
>>> dict1
{1: 'ahello', 2: 'b', 3: 'c', 4: 'four', 5: 'five'}
>>>
dict1.get(k,"") returns the value associated to k if it exists or an empty string otherwise, and then append the content of dict2.
A benchmark shows your suspicions of its performance impact appear to be correct:
$ python -m timeit -s 'd = {"key": "value"}' 'd["key"] = "value"'
10000000 loops, best of 3: 0.0741 usec per loop
$ python -m timeit -s 'd = {"key": "value"}' 'd.update(key="value")'
1000000 loops, best of 3: 0.294 usec per loop
$ python -m timeit -s 'd = {"key": "value"}' 'd.update({"key": "value"})'
1000000 loops, best of 3: 0.461 usec per loop
That is, it's about six times slower on my machine. However, Python is already not a language you'd use if you need top performance, so I'd just recommend use of whatever is most readable in the situation. For many things, that would be the [] way, though update could be more readable in a situation like this:
configuration.update(
timeout=60,
host='example.com',
)
…or something like that.
Updating the key directly is thrice as fast, but YMMV:
$ python -m timeit 'd={"k":1}; d.update({"k":2})'
1000000 loops, best of 3: 0.669 usec per loop
$ python -m timeit 'd={"k":1}; d["k"] = 2'
1000000 loops, best of 3: 0.212 usec per loop
There are two approaches:
- Use
defaultdict - Use your own implementation of defaultdict
Assuming that your file looks like this:
a 1
b 4
a 2
...
Then you can do this:
import collections
answer = collections.defaultdict(list)
with open('path/to/file') as infile:
for line in infile:
key, value = line.strip().split()
answer[key].append(value)
If you don't want to use defaultdict, then:
answer = {}
with open('path/to/file') as infile:
for line in infile:
key, value = line.strip().split()
if key not in answer:
answer[key] = []
answer[key].append(value)
Hope this helps
Use defaultdict
example:
from collections import defaultdict
d = defaultdict(list)
d['a'].append(1)
d['a'].append(2)
Basically you initialize it with a factory function that will return what the 'default' value should be, and when you try and get an item from the dictionary by key it will run that function if the key does not yet exist. In this case it will return an empty list.
I've completed DNA and submitted for full credit using lists instead of dictionaries. DNA was really enthralling to me for some reason and I'm going back and trying to make my code both more pythonic and attempting to get it better optimized. Part of my motivation is that I just don't feel anywhere near as comfortable with dictionaries as I did coming out of previous weeks' psets that had similar, heavier (for me) concepts.
One specific area that's giving me trouble in my understanding is the .update() method. I'm using it to store the small.csv info into a dict named STR. I had thought it was the analogue of .append() for lists but, after trying to incorporate it into my revamped DNA, it will update for the first row of the CSV being read on the first iteration but then it just continually replaces that single row/entry in the dict with each iteration. I'm sure I'm just not grasping something fundamental about dicts and/or update() but am not knowledgeable enough yet to know what that might be. I'm not even sure it's technically necessary to be storing the database csv or if it's better to work with the CSV in-place.
Could someone please help me understand why my expectation of update() is flawed?
The code below only stores the last line of the small.csv database:
{'name': 'Charlie', 'AGATC': '3', 'AATG': '2', 'TATC': '5'}
# Open person STR profiles csv and append to STR list
with open(sys.argv[1], 'r', newline = '') as file:
reader = csv.DictReader(file)
for row in reader:
STR.update(row)The append () function is not the property of a dictionary. It is for a list to add new elements to the list. So that you are recommended to use .add() function to add a new element in the dictionary as follow:
dict['jobs'].add('doctor')
A dictionary is a collection of key, value pairs. The values can be of any type. Here the value of the key jobs is of type set. If it were of type list, then, yes, you could use the .append method. But alas it is not so you can't. You can .add to a set, but this may not be what you are looking for if you want a data-structure that you can index (a set is a collection of hashable objects - there is no concept of order, whereas there is in a list).
These examples should get across the core ideas of lists and sets:
>>> l = [1, 2, 3]
>>> l.append(4)
>>> l
[1, 2, 3, 4]
>>> l[3]
4
>>> s = {1, 2, 3}
>>> s.add(4)
>>> s.add('hello')
>>> s
{1, 2, 3, 4, 'hello'}
>>> s.add(-100)
>>> s
{1, 2, 3, 4, -100, 'hello'}
>>> -100 in s
True