seriesgroupby object has no attribute drop_duplicates

stackoverflow.com › questions › 37105609 › drop-duplicates-using-pandas-groupby

You don't need groupby to drop duplicates based on a few columns, you can specify a subset instead:

df2 = df.drop_duplicates(["date", "cid"])
df2.groupby('date').cid.size()
Out[99]: 
date
2005      3
2006     10
2007    227
2008     52
2009    142
2010     57
2011    219
2012     99
2013    238
2014    146
dtype: int64

Answer from user2285236 on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 37105609 › drop-duplicates-using-pandas-groupby

python - Drop duplicates using pandas groupby - Stack Overflow

Top answer

1 of 2

You don't need groupby to drop duplicates based on a few columns, you can specify a subset instead:

df2 = df.drop_duplicates(["date", "cid"])
df2.groupby('date').cid.size()
Out[99]: 
date
2005      3
2006     10
2007    227
2008     52
2009    142
2010     57
2011    219
2012     99
2013    238
2014    146
dtype: int64

2 of 2

1. `groupby.head(1)`

The relevant groupby method to drop duplicates in each group is groupby.head(1). Note that it is important to pass 1 to select the first row of each date-cid pair.

df1 = df.groupby(['date', 'cid']).head(1)

2. `duplicated()` is more flexible

Another method is to use duplicated() to create a boolean mask and filter.

df3 = df[~df.duplicated(['date', 'cid'])]

An advantage of this method over drop_duplicates() is that is can be chained with other boolean masks to filter the dataframe more flexibly. For example, to select the unique cids in Nevada for each date, use:

df_nv = df[df['state'].eq('NV') & ~df.duplicated(['date', 'cid'])]

3. `groupby.sample(1)`

Another method to select a unique row from each group to use groupby.sample(). Unlike the previous methods mentioned, it selects a row from each group randomly (whereas the others only keep the first row from each group).

df4 = df.groupby(['date', 'cid']).sample(n=1)

You can verify that df1, df2 (ayhan's output) and df3 all produce the very same output and df4 produces an output where size and nunique of cid match for each date (as required in the OP). In short, the following returns True.

w, x, y, z = [d.groupby('date')['cid'].agg(['size', 'nunique']) for d in (df1, df2, df3, df4)]
w.equals(x) and w.equals(y) and w.equals(z)   # True

and w, x, y, z all look like the following:

       size  nunique
date        
2005      7        3
2006    237       10
2007   3610      227
2008   1318       52
2009   2664      142
2010    997       57
2011   6390      219
2012   2904       99
2013   7875      238
2014   3979      146

reddit.com › r/learnpython › drop_duplicates not working, what to do?

r/learnpython on Reddit: drop_duplicates not working, what to do?

July 29, 2023 -

df=df.drop_duplicates(subset=None, keep='first', inplace=False)

it shows:

AttributeError: 'NoneType' object has no attribute 'drop_duplicates'

Top answer

1 of 2

This indicates df is None when you called drop_duplicates(). Your code did something unexpected prior to what you've excerpted.

2 of 2

You expect the variable df is a DataFrame but it's not. You need to assign a DataFrame (with data) as df in order to use the drop_duplicates method.

Discussions

python - Pandas 'DataFrame' object has no attribute 'unique' - Stack Overflow

Communities for your favorite technologies. Explore all Collectives · Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work More on stackoverflow.com

stackoverflow.com

python - pandas: drop duplicates in groupby 'date' - Stack Overflow

Your 2nd attribute error is simply caused by executing this: ('date').drop_duplicates('cid'), it has nothing to do with pandas. Indeed, the error message is telling you that 'date', a str type object, doesn't have an attribute called drop_duplicates. More on stackoverflow.com

stackoverflow.com

SeriesGroupBy Object has not Attribute Diff

There was an error while loading. Please reload this page · I have a multi-index DaskDataframe and am unable to compute a simple diff after a groupby operation on the dataframe More on github.com

github.com

December 17, 2018

[FEA] drop_duplicates for Series

Is your feature request related to a problem? Please describe. We seem to have DataFrame.drop_duplicates but not Series.drop_duplicates. It would be nice to have drop_duplicates for series too beca... More on github.com

github.com

July 12, 2019

GitHub

github.com › modin-project › modin › issues › 1115

DataFrame.drop_duplicates raises an exception · Issue #1115 · modin-project/modin

February 27, 2020 - This may take some time. name max_speed health 1 one 1 10 2 two 4 20 3 three 7 30 Traceback (most recent call last): File "drop_duplicates_test.py", line 13, in <module> df1 = df.drop_duplicates("name") File "/nfs/site/proj/scripting_tools/gashiman/modin/modin/pandas/dataframe.py", line 208, in drop_duplicates subset=subset, keep=keep, inplace=inplace File "/nfs/site/proj/scripting_tools/gashiman/modin/modin/pandas/base.py", line 1120, in drop_duplicates duplicates = self.duplicated(keep=keep, subset=kwargs.get("subset")) File "/nfs/site/proj/scripting_tools/gashiman/modin/modin/pandas/datafra

Author gshimansky

Stack Overflow

stackoverflow.com › questions › 29244549 › pandas-dataframe-object-has-no-attribute-unique

python - Pandas 'DataFrame' object has no attribute 'unique' - Stack Overflow

Top answer

1 of 5

DataFrames do not have that method; columns in DataFrames do:

df['A'].unique()

Or, to get the names with the number of observations (using the DataFrame given by closedloop):

>>> df.groupby('person').person.count()
Out[80]: 
person
0         2
1         3
Name: person, dtype: int64

2 of 5

Rather than removing duplicates during the pivot table process, use the df.drop_duplicates() function to selectively drop duplicates.

For example if you are pivoting using these index='c0' and columns='c1' then this simple step yields the correct counts.

In this example the 5th row is a duplicate of the 4th (ignoring the non-pivoted c2 column

import pandas as pd
data = {'c0':[0,1,0,1,1], 'c1':[0,0,1,1,1], 'person':[0,0,1,1,1], 'c_other':[1,2,3,4,5]}
df = pd.DataFrame(data)
df2 = df.drop_duplicates(subset=['c0','c1','person'])
pd.pivot_table(df2, index='c0',columns='c1',values='person', aggfunc='count')

This correctly outputs

Stack Overflow

stackoverflow.com › questions › 37105609 › pandas-drop-duplicates-in-groupby-date › 37105655

python - pandas: drop duplicates in groupby 'date' - Stack Overflow

Top answer

1 of 1

You don't need groupby to drop duplicates based on a few columns, you can specify a subset instead:

df2 = df.drop_duplicates(["date", "cid"])
df2.groupby('date').cid.size()
Out[99]: 
date
2005      3
2006     10
2007    227
2008     52
2009    142
2010     57
2011    219
2012     99
2013    238
2014    146
dtype: int64

GitHub

github.com › dask › dask › issues › 4307

SeriesGroupBy Object has not Attribute Diff · Issue #4307 · dask/dask

December 17, 2018 - I have a multi-index DaskDataframe and am unable to compute a simple diff after a groupby operation on the dataframe. df.groupby('IndexName')['ColName'].diff() ..'SeriesGroupBy' object has no attribute 'diff The Dask Series object has a ...

Author bgoodman44

GitHub

github.com › rapidsai › cudf › issues › 2233

[FEA] drop_duplicates for Series · Issue #2233 · rapidsai/cudf

July 12, 2019 - Is your feature request related to a problem? Please describe. We seem to have DataFrame.drop_duplicates but not Series.drop_duplicates. It would be nice to have drop_duplicates for series too because dask.dataframe.categorize() fails du...

Author galipremsagar

Pandas

pandas.pydata.org › docs › reference › api › pandas.Series.groupby.html

pandas.Series.groupby — pandas 3.0.2 documentation

Series.groupby(by=None, level=None, *, as_index=True, sort=True, group_keys=True, observed=True, dropna=True)[source]# Group Series using a mapper or by a Series of columns. A groupby operation involves some combination of splitting the object, applying a function, and combining the results.

Find elsewhere

Google Bing Mojeek

GitHub

github.com › pandas-dev › pandas › issues › 11640

BUG AttributeError: 'DataFrameGroupBy' object has no attribute '_obj_with_exclusions' · Issue #11640 · pandas-dev/pandas

November 18, 2015 - In [5]: df.groupby('a').mean() --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-29-a830c6135818> in <module>() ----> 1 df.groupby('a').mean() /home/nicolas/Git/pandas/pandas/core/groupby.py in mean(self) 764 self._set_selection_from_grouper() 765 f = lambda x: x.mean(axis=self.axis) --> 766 return self._python_agg_general(f) 767 768 def median(self): /home/nicolas/Git/pandas/pandas/core/groupby.py in _python_agg_general(self, func, *args, **kwargs) 1245 output[name] = self._try_cast(values[mask], result)

Author nbonnotte

Stack Overflow

stackoverflow.com › questions › 52642351 › why-pandas-gives-attributeerror-seriesgroupby-object-has-no-attribute-pct

python - Why Pandas gives AttributeError: 'SeriesGroupBy' object has no attribute 'pct'? - Stack Overflow

Top answer

1 of 1

There is string 'pct', need variable pct - lambda function by removing '':

aggs = {'B':pct}
print(df.groupby('A').agg(aggs))

          B
A          
1  0.333333
4  0.333333
7  0.333333

Databricks Community

community.databricks.com › t5 › data-engineering › attributeerror-dataframe-object-has-no-attribute › td-p › 61132

AttributeError: 'DataFrame' object has no attribut... - Databricks Community - 61132

February 19, 2024 - Hello, I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message: "AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'" Here is part of the code: @dl...

Pandas

pandas.pydata.org › pandas-docs › stable › reference › api › pandas.Series.drop_duplicates.html

pandas.Series.drop_duplicates — pandas 2.3.3 documentation

False : Drop all duplicates. ... If True, performs operation inplace and returns None.

Stack Overflow

stackoverflow.com › questions › 38235922 › pandas-dataframe-to-dict-fails-after-drop-duplicates › 38235935

python - Pandas DataFrame to_dict fails after drop_duplicates - Stack Overflow

Top answer

1 of 1

I think you miss (), because without the () the drop_duplicates just refers to the function, so df becomes a copy of the function, not the result of executing it (thanks andychase for comment):

df = df.drop_duplicates()
return df.to_dict(orient="records")

Or:

df.drop_duplicates(inplace=True)
return df.to_dict(orient="records")

Pandas

pandas.pydata.org › pandas-docs › version › 0.17.1 › generated › pandas.DataFrame.drop_duplicates.html

pandas.DataFrame.drop_duplicates — pandas 0.17.1 documentation

Enter search terms or a module, class or function name · Return DataFrame with duplicate rows removed, optionally only considering certain columns

GitHub

github.com › dask › dask › issues › 2952

can't drop duplicated on dask dataframe index · Issue #2952 · dask/dask

December 3, 2017 - I tried using the following code as suggested by jezrael in stackoverflow rxTable[~rxTable.index.to_Series().duplicated()] and got · AttributeError: 'Index' object has no attribute 'to_Series' It worked a few days ago and just stopped, i can't find any difference in the code and data.

Author thebeancounter

GitHub

github.com › pandas-dev › pandas › issues › 8623

pd.Catagorical breaks transform, drop_duplicates, iloc · Issue #8623 · pandas-dev/pandas

October 24, 2014 - x=pd.DataFrame([[1,'John P. Doe'],[2,'Jane Dove'],[1,'John P. Doe']], columns=['person_id','person_name']) x['person_name']=pd.Categorical(x.person_name) # doing this breaks transform g=x.groupby(['person_id']) g.transform(lambda x:x) AttributeError: 'ObjectBlock' object has no attribute '_holder' using drop_duplicates inside apply (I often need this): g.apply(lambda x: x.drop_duplicates('person_name')) SystemError: numpy/core/src/multiarray/iterators.c:370: bad argument to internal function ·

Author kay1793

Pandas

pandas.pydata.org › docs › reference › api › pandas.Series.drop_duplicates.html

pandas.Series.drop_duplicates — pandas 3.0.2 documentation

False : Drop all duplicates. ... If True, performs operation inplace and returns None.

Stack Overflow

stackoverflow.com › questions › 46534653 › error-attributeerror-dataframegroupby-object-has-no-attribute-while-groupby

python - Error 'AttributeError: 'DataFrameGroupBy' object has no attribute' while groupby functionality on dataframe - Stack Overflow

The problem is it not identifying the NEWS_SENTIMENT_DAILY_AVG column. Error message - AttributeError: 'DataFrameGroupBy' object has no attribute 'NEWS_SENTIMENT_DAILY_AVG'

GeeksforGeeks

geeksforgeeks.org › python-pandas-series-drop_duplicates

Python | Pandas Series.drop_duplicates() | GeeksforGeeks

February 13, 2019 - The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Pandas Series.dropna() function return a ne ... Pandas Index.drop_duplicates() function return Index with duplicate values removed in Python.

reddit.com › r/dataanalysis › data analysis in python

r/dataanalysis on Reddit: Data Analysis in Python

December 19, 2022 -

Hello everyone!
I am a newbie at python and I looked up some problems associated with the Data Expo 2009: Airline on time data from the Harvard Dataverse (https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/HG7NV7).
I am currently working on the following question:

When is the best time of day, day of the week, and time of year to fly to minimize delays?

All libraries are imported and the data is cleared up (empty columns and duplicate rows are dropped).
What I was intending to do is to plot a bar chart with "Months" on the x-axis and "ArrDelay" (arrival delays) on the y-axis.

My code looks the following way (I'm using jupyter notebook):

import pandas as pd 
dataair = pd.read_csv("/Users/issakovakamilla/Desktop/2000.csv.bz2")
dataair.dropna(how='all', axis=1, inplace=True)
dataair
import matplotlib.pyplot as plt
df = pd.DataFrame(dataair)
X = list(df.iloc[:, 0])
Y = list(df.iloc[:, 1])
plt.bar(X, Y, color='g')
plt.title("stats")
plt.xlabel("Month")
plt.ylabel("ArrDelay")
plt.show()

Somehow I don't get a plot - its been executing for 10 minutes now (I get * near input). Could anyone help me with this?

Top answer

1 of 1

I fixed your code: # %% import pandas as pd import matplotlib.pyplot as plt # %% df = pd.read_csv("../data/2000.csv.bz2").dropna(how="all", axis=1) print(df.shape) df.head() # %% df = df.groupby("Month").agg({"ArrDelay": "mean"}).reset_index() print(df.shape) df.head() # %% plt.bar(df["Month"], df["ArrDelay"], color="g") plt.title("stats") plt.xlabel("Month") plt.ylabel("ArrDelay") plt.show() # %% You want 1 value per month, not 50k. Otherwise matplotlib is working on plotting 50k values per x-axis tick which ... will take some time. Also don't use inplace.

1. groupby.head(1)

2. duplicated() is more flexible

3. groupby.sample(1)

1. `groupby.head(1)`

2. `duplicated()` is more flexible

3. `groupby.sample(1)`