One-liner. No transpose needed:
df.loc[~(df == 0).all(axis=1)]
And for those who like symmetry, this also works...
df.loc[(df != 0).any(axis=1)]
Answer from 8one6 on Stack OverflowOne-liner. No transpose needed:
df.loc[~(df == 0).all(axis=1)]
And for those who like symmetry, this also works...
df.loc[(df != 0).any(axis=1)]
It turns out this can be nicely expressed in a vectorized fashion:
> df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
> df = df[(df.T != 0).any()]
> df
a b
1 0 1
2 1 0
3 1 1
You can use difference to not look at the Timestamp column. Then sum over the horizontal axis looking for non-zero sum rows:
df.loc[df[df.columns.difference(['Timestamp'])].sum(axis=1) != 0]
If you want to remove rows which have only timestamp as null, you can use something like this
df.drop(df[df['Timestamp'].isnull()].index,inplace=True)
python - Delete Pandas DataFrame row where column value is < 0 - Stack Overflow
python - How to remove rows from a DataFrame where some columns only have zero values - Stack Overflow
Python: drop value=0 row in specific columns - Stack Overflow
python - Filter out rows/columns with zero values in MultiIndex dataframe - Stack Overflow
Videos
If I'm understanding correctly, it should be as simple as:
df = df[df.line_race != 0]
But for any future bypassers you could mention that df = df[df.line_race != 0] doesn't do anything when trying to filter for None/missing values.
Does work:
df = df[df.line_race != 0]
Doesn't do anything:
df = df[df.line_race != None]
Does work:
df = df[df.line_race.notnull()]
Row filtering on selected columns, any have zeroes with any:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.array([[1, 2, 3, 4, 5, 6], [11, 22, 33, 44, 55, 66],
[111, 222, 0, 0, 0, 0], [1111, 0, 0, 0, 0, 0]]),
columns=['a', 'b', 'c', 'd', 'e', 'f'])
df = df[(df[['c', 'd', 'e', 'f']] != 0).any(axis=1)]
print(df)
Output:
a b c d e f
0 1 2 3 4 5 6
1 11 22 33 44 55 66
with operators
df.loc[~((((df['c'] == 0) & (df['d'] == 0)) & (df['e'] == 0)) & (df['f'] == 0))]
I think you need create boolean DataFrame by compare all filtered columns values by scalar for not equality and then check all Trues per rows by all:
df = df[(df[['A','C']] != 0).all(axis=1)]
print (df)
A B C
0 1 2 5
2 6 8 4
Details:
print (df[['A','C']] != 0)
A C
0 True True
1 True False
2 True True
3 False True
print ((df[['A','C']] != 0).all(axis=1))
0 True
1 False
2 True
3 False
dtype: bool
I think you need create boolean DataFrame by compare all values by scalar and then check any Trues per rows by any and last invert mask by ~:
df = df[~(df[['A','C']] == 0).any(axis=1)]
Details:
print (df[['A','C']])
A C
0 1 5
1 4 0
2 6 4
3 0 2
print (df[['A','C']] == 0)
A C
0 False False
1 False True
2 False False
3 True False
print ((df[['A','C']] == 0).any(axis=1))
0 False
1 True
2 False
3 True
dtype: bool
print (~(df[['A','C']] == 0).any(axis=1))
0 True
1 False
2 True
3 False
dtype: bool
One line hack using .dropna()
import pandas as pd
df = pd.DataFrame({'A':[1,4,6,0],'B':[2,4,8,4],'C':[5,0,4,2]})
print df
A B C
0 1 2 5
1 4 4 0
2 6 8 4
3 0 4 2
columns = ['A', 'C']
df = df.replace(0, pd.np.nan).dropna(axis=0, how='any', subset=columns).fillna(0).astype(int)
print df
A B C
0 1 2 5
2 6 8 4
So, what's happening is:
- Replace
0byNaNwith.replace() - Use
.dropna()to dropNaNconsidering only columnsAandC - Replace
NaNback to0with.fillna()(not needed if you use all columns instead of only a subset) - Correct the data type from
floattointwith.astype()
df.loc[:, (df != 0).any(axis=0)]
Here is a break-down of how it works:
In [74]: import pandas as pd
In [75]: df = pd.DataFrame([[1,0,0,0], [0,0,1,0]])
In [76]: df
Out[76]:
0 1 2 3
0 1 0 0 0
1 0 0 1 0
[2 rows x 4 columns]
df != 0 creates a boolean DataFrame which is True where df is nonzero:
In [77]: df != 0
Out[77]:
0 1 2 3
0 True False False False
1 False False True False
[2 rows x 4 columns]
(df != 0).any(axis=0) returns a boolean Series indicating which columns have nonzero entries. (The any operation aggregates values along the 0-axis -- i.e. along the rows -- into a single boolean value. Hence the result is one boolean value for each column.)
In [78]: (df != 0).any(axis=0)
Out[78]:
0 True
1 False
2 True
3 False
dtype: bool
And df.loc can be used to select those columns:
In [79]: df.loc[:, (df != 0).any(axis=0)]
Out[79]:
0 2
0 1 0
1 0 1
[2 rows x 2 columns]
To "delete" the zero-columns, reassign df:
df = df.loc[:, (df != 0).any(axis=0)]
Here is an alternative way to use is
df.replace(0,np.nan).dropna(axis=1,how="all")
Compared with the solution of unutbu, this way is obviously slower:
%timeit df.loc[:, (df != 0).any(axis=0)]
652 µs ± 5.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df.replace(0,np.nan).dropna(axis=1,how="all")
1.75 ms ± 9.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Use np.all with an axis argument:
>>> r[np.all(r == 0, axis=1)]
array([[ 0., 0., 0.]])
>>> r[~np.all(r == 0, axis=1)]
array([[-1.41421356, -0.70710678, -0.70710678],
[ 0. , -1.22474487, -1.22474487]])
Because the data are not equal zero exactly, we need set a threshold value for zero such as 1e-6, use numpy.all with axis=1 to check the rows are zeros or not. Use numpy.where and numpy.diff to get the split positions, and call numpy.split to split the array into a list of arrays.
import numpy as np
[q,r] = np.linalg.qr(np.array([1,0,0,0,1,1,1,1,1]).reshape(3,3))
mask = np.all(np.abs(r) < 1e-6, axis=1)
pos = np.where(np.diff(mask))[0] + 1
result = np.split(r, pos)