collections.Counter already does what you want.
Copyfrom collections import Counter
c = Counter([1,2,3,4,5,100,100,1000])
print(c)
# Counter({100: 2, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 1000: 1})
Answer from tzaman on Stack Overflowcollections.Counter already does what you want.
Copyfrom collections import Counter
c = Counter([1,2,3,4,5,100,100,1000])
print(c)
# Counter({100: 2, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 1000: 1})
So, to reduce the code, and make it more readable, you can just use a default dict instead. To use the default dict, you first have to import it from the collections module and then create the default dict object. The default dictionary requires me to give it something called a factory function. In this case, I'll give it the integer class, and that's essentially going to act as the creator of the default value so that if I try to access a key that doesn't exist, it will create a default value for me using this object as the constructor, and since creating a new int object initializes the value to zero, I can now just access any key, and increment it without checking to see if it's already there.
So you have to import the defaultdict first.
Copyfrom collections import defaultdict
# your list of numbers
nums = [1,2,3,4,5,100,100,1000]
# use a default dictionary to count each element
numCounter = defaultdict(int)
# Count the elements in the list
for num in nums:
numCounter[num] += 1
# print the result
for (k, v) in numCounter.items():
print(str(k) + ": " + str(v))
The output will be
1:1,
2:1,
3:1,
4:1,
5:1,
100:2,
1000:1
python - Counting word frequency and making a dictionary from it - Stack Overflow
Creating a dictionary of tokens with frequency
python - Best way to turn word list into frequency dict - Stack Overflow
Using a dictionary to find letter frequency from a list of words
Videos
If you don't want to use collections.Counter, you can write your own function:
import sys
filename = sys.argv[1]
fp = open(filename)
data = fp.read()
words = data.split()
fp.close()
unwanted_chars = ".,-_ (and so on)"
wordfreq = {}
for raw_word in words:
word = raw_word.strip(unwanted_chars)
if word not in wordfreq:
wordfreq[word] = 0
wordfreq[word] += 1
for finer things, look at regular expressions.
Although using Counter from the collections library as suggested by @Michael is a better approach, I am adding this answer just to improve your code. (I believe this will be a good answer for a new Python learner.)
From the comment in your code it seems like you want to improve your code. And I think you are able to read the file content in words (while usually I avoid using read() function and use for line in file_descriptor: kind of code).
As words is a string, in for loop, for i in words: the loop-variable i is not a word but a char. You are iterating over chars in the string instead of iterating over words in the string words. To understand this, notice following code snippet:
>>> for i in "Hi, h r u?":
... print i
...
H
i
,
h
r
u
?
>>>
Because iterating over the given string char by chars instead of word by words is not what you wanted to achieve, to iterate words by words you should use the split method/function from string class in Python.
str.split(str="", num=string.count(str)) method returns a list of all the words in the string, using str as the separator (splits on all whitespace if left unspecified), optionally limiting the number of splits to num.
Notice the code examples below:
Split:
>>> "Hi, how are you?".split()
['Hi,', 'how', 'are', 'you?']
loop with split:
>>> for i in "Hi, how are you?".split():
... print i
...
Hi,
how
are
you?
And it looks like something you need. Except for word Hi, because split(), by default, splits by whitespaces so Hi, is kept as a single string (and obviously) you don't want that.
To count the frequency of words in the file, one good solution is to use regex. But first, to keep the answer simple I will be using replace() method. The method str.replace(old, new[, max]) returns a copy of the string in which the occurrences of old have been replaced with new, optionally restricting the number of replacements to max.
Now check code example below to see what I suggested:
>>> "Hi, how are you?".split()
['Hi,', 'how', 'are', 'you?'] # it has , with Hi
>>> "Hi, how are you?".replace(',', ' ').split()
['Hi', 'how', 'are', 'you?'] # , replaced by space then split
loop:
>>> for word in "Hi, how are you?".replace(',', ' ').split():
... print word
...
Hi
how
are
you?
Now, how to count frequency:
One way is use Counter as @Michael suggested, but to use your approach in which you want to start from empty an dict. Do something like this code sample below:
words = f.read()
wordfreq = {}
for word in .replace(', ',' ').split():
wordfreq[word] = wordfreq.setdefault(word, 0) + 1
# ^^ add 1 to 0 or old value from dict
What am I doing? Because initially wordfreq is empty you can't assign it to wordfreq[word] for the first time (it will raise key exception error). So I used setdefault dict method.
dict.setdefault(key, default=None) is similar to get(), but will set dict[key]=default if key is not already in dict. So for the first time when a new word comes, I set it with 0 in dict using setdefault then add 1 and assign to the same dict.
I have written an equivalent code using with open instead of single open.
with open('~/Desktop/file') as f:
words = f.read()
wordfreq = {}
for word in words.replace(',', ' ').split():
wordfreq[word] = wordfreq.setdefault(word, 0) + 1
print wordfreq
That runs like this:
$ cat file # file is
this is the textfile, and it is used to take words and count
$ python work.py # indented manually
{'and': 2, 'count': 1, 'used': 1, 'this': 1, 'is': 2,
'it': 1, 'to': 1, 'take': 1, 'words': 1,
'the': 1, 'textfile': 1}
Using re.split(pattern, string, maxsplit=0, flags=0)
Just change the for loop: for i in re.split(r"[,\s]+", words):, that should produce the correct output.
Edit: better to find all alphanumeric character because you may have more than one punctuation symbols.
>>> re.findall(r'[\w]+', words) # manually indent output
['this', 'is', 'the', 'textfile', 'and',
'it', 'is', 'used', 'to', 'take', 'words', 'and', 'count']
use for loop as: for word in re.findall(r'[\w]+', words):
How would I write code without using read():
File is:
$ cat file
This is the text file, and it is used to take words and count. And multiple
Lines can be present in this file.
It is also possible that Same words repeated in with capital letters.
Code is:
$ cat work.py
import re
wordfreq = {}
with open('file') as f:
for line in f:
for word in re.findall(r'[\w]+', line.lower()):
wordfreq[word] = wordfreq.setdefault(word, 0) + 1
print wordfreq
Used lower() to convert an upper letter to lower letter.
output:
$python work.py # manually strip output
{'and': 3, 'letters': 1, 'text': 1, 'is': 3,
'it': 2, 'file': 2, 'in': 2, 'also': 1, 'same': 1,
'to': 1, 'take': 1, 'capital': 1, 'be': 1, 'used': 1,
'multiple': 1, 'that': 1, 'possible': 1, 'repeated': 1,
'words': 2, 'with': 1, 'present': 1, 'count': 1, 'this': 2,
'lines': 1, 'can': 1, 'the': 1}
Hi, in my class we recently learned how to make a dictionary that takes the tokens of a text file as keys and the frequency of those tokens as values. The code does work but I'm having a hard time understanding HOW it works.
This is the code (inside a function) I have:
token_list = [i for i in token if i[0] in vowels]
d = {}
for t in token_list:
if t in d:
d[t] += 1
else:
d[t] = 1
return d
token_list is all the tokens in Alice in Wonderland that start with vowel. What I don't understand is what d[t] is. I suppose it's the values since it increases at each iteration, but why is it written like this? And from where does the dictionary take the keys? if t in d is especially confusing to me, because since d is empty at the beginning, how can the if actually work?
Thank you for anyone that can answer my questions and maybe break down this code for me!
Hi everyone,
I've been stuck on this assignment for a few hours. Essentially, I have to create a function that returns a dictionary. However, each key is a letter and the value is the count that represents how many times that letter was found. The letters have to be lowercase and I cannot use any modules or import anything. The code I have so far counts the frequency of words in a list, but I'm not sure how to break it down and have it count the letters.
Edit: I fixed the code so it now prints the letter frequency, but it's not exact. For example, there should be two g's but it only prints one. I'm not sure what the problem is. Can someone please help?
def build_letter_distribution(listOfWords):
dictionary = {}
for string in listOfWords:
string = string.lower()
for char in string:
dictionary[char] = string.count(char)
return dictionary
print(build_letter_distribution(["Penguin", "Dog", "cat", "CAT"]))This is the output I get so far:
{'p': 1, 'e': 1, 'n': 2, 'g': 1, 'u': 1, 'i': 1, 'd': 1, 'o': 1, 'c': 1, 'a': 1, 't': 1}suppose your dictionary is named d and you want a count of the values:
from collections import Counter
d = {1: "dog", 2: "cat", 3: "dog", 4: "elephant"}
counts = Counter(d.values())
Now you can use the counts Counter:
# Values in the original dictionary
counts['dog'] # 2
counts['elephant'] # 1
# Value not in the original dictionary
counts['fish'] # 0
If you must use custom code and classes for this instead of the standard library, I think your error is here:
result.add((self.count(i), self.items[i]))
What you may want is:
animal_name = self.items[i] #?
result.add((self.count(animal_name), animal_name))
Otherwise you may want to share with us what is in self.items...
d = {1:"Dog",2:"Cat", 3:"Dog", 4:"Elephant"}
count = {}
for v in d.values():
if(not(v in count)):
count[v] = 0
count[v] += 1
print(count)
{'Dog': 2, 'Cat': 1, 'Elephant': 1}