The axis parameter is orthogonal to the direction which you wish to sum.
Unfortunately, the pandas documentation for sum doesn't currently make this clear, but the documentation for count does:
Answer from human3 on Stack OverflowParameters
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
If 0 or ‘index’ counts are generated for each column. If 1 or ‘columns’ counts are generated for each row.
The axis parameter is orthogonal to the direction which you wish to sum.
Unfortunately, the pandas documentation for sum doesn't currently make this clear, but the documentation for count does:
Parameters
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
If 0 or ‘index’ counts are generated for each column. If 1 or ‘columns’ counts are generated for each row.
I'm seeing the opposite behavior as you explained:
Sums across the columns
In [3309]: df1.isnull().sum(1)
Out[3309]:
0 0
1 1
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
dtype: int64
Sums down the columns
In [3310]: df1.isnull().sum()
Out[3310]:
date 0
variable 1
value 0
dtype: int64
Try this:
In [71]: df
Out[71]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1
In [72]: pd.isnull(df).sum()
Out[72]:
a 1
b 2
c 0
dtype: int64
or:
In [76]: df.isnull().sum()
Out[76]:
a 1
b 2
c 0
dtype: int64
you can create a DF out of it:
In [78]: df.isnull().sum().to_frame('nulls')
Out[78]:
nulls
a 1
b 2
c 0
If the number of columns in your dataframe is greater than 10 you will end up with the middle columns being left out of the output. You can print every column using:
nulls = df.isnull().sum().to_frame()
for index, row in nulls.iterrows():
print(index, row[0])
You can apply a count over the rows like this:
test_df.apply(lambda x: x.count(), axis=1)
test_df:
A B C
0: 1 1 3
1: 2 nan nan
2: nan nan nan
output:
0: 3
1: 1
2: 0
You can add the result as a column like this:
test_df['full_count'] = test_df.apply(lambda x: x.count(), axis=1)
Result:
A B C full_count
0: 1 1 3 3
1: 2 nan nan 1
2: nan nan nan 0
When using pandas, try to avoid performing operations in a loop, including apply, map, applymap etc. That's slow!
A DataFrame object has two axes: “axis 0” and “axis 1”. “axis 0” represents rows and “axis 1” represents columns.
If you want to count the missing values in each column, try:
df.isnull().sum() as default or df.isnull().sum(axis=0)
On the other hand, you can count in each row (which is your question) by:
df.isnull().sum(axis=1)
It's roughly 10 times faster than Jan van der Vegt's solution(BTW he counts valid values, rather than missing values):
In [18]: %timeit -n 1000 df.apply(lambda x: x.count(), axis=1)
1000 loops, best of 3: 3.31 ms per loop
In [19]: %timeit -n 1000 df.isnull().sum(axis=1)
1000 loops, best of 3: 329 µs per loop