'dataframegroupby' object has no attribute 'drop_duplicates'

drop_duplicates not working, what to do?

reddit.com › r › learnpython › comments › 15cybeo › drop_duplicates_not_working_what_to_do

This indicates df is None when you called drop_duplicates(). Your code did something unexpected prior to what you've excerpted. Answer from RandomCodingStuff on reddit.com

reddit.com › r/learnpython › drop_duplicates not working, what to do?

r/learnpython on Reddit: drop_duplicates not working, what to do?

July 29, 2023 -

df=df.drop_duplicates(subset=None, keep='first', inplace=False)

it shows:

AttributeError: 'NoneType' object has no attribute 'drop_duplicates'

Top answer

1 of 2

This indicates df is None when you called drop_duplicates(). Your code did something unexpected prior to what you've excerpted.

2 of 2

You expect the variable df is a DataFrame but it's not. You need to assign a DataFrame (with data) as df in order to use the drop_duplicates method.

Stack Overflow

stackoverflow.com › questions › 37105609 › drop-duplicates-using-pandas-groupby

python - Drop duplicates using pandas groupby - Stack Overflow

Top answer

1 of 2

You don't need groupby to drop duplicates based on a few columns, you can specify a subset instead:

df2 = df.drop_duplicates(["date", "cid"])
df2.groupby('date').cid.size()
Out[99]: 
date
2005      3
2006     10
2007    227
2008     52
2009    142
2010     57
2011    219
2012     99
2013    238
2014    146
dtype: int64

2 of 2

1. `groupby.head(1)`

The relevant groupby method to drop duplicates in each group is groupby.head(1). Note that it is important to pass 1 to select the first row of each date-cid pair.

df1 = df.groupby(['date', 'cid']).head(1)

2. `duplicated()` is more flexible

Another method is to use duplicated() to create a boolean mask and filter.

df3 = df[~df.duplicated(['date', 'cid'])]

An advantage of this method over drop_duplicates() is that is can be chained with other boolean masks to filter the dataframe more flexibly. For example, to select the unique cids in Nevada for each date, use:

df_nv = df[df['state'].eq('NV') & ~df.duplicated(['date', 'cid'])]

3. `groupby.sample(1)`

Another method to select a unique row from each group to use groupby.sample(). Unlike the previous methods mentioned, it selects a row from each group randomly (whereas the others only keep the first row from each group).

df4 = df.groupby(['date', 'cid']).sample(n=1)

You can verify that df1, df2 (ayhan's output) and df3 all produce the very same output and df4 produces an output where size and nunique of cid match for each date (as required in the OP). In short, the following returns True.

w, x, y, z = [d.groupby('date')['cid'].agg(['size', 'nunique']) for d in (df1, df2, df3, df4)]
w.equals(x) and w.equals(y) and w.equals(z)   # True

and w, x, y, z all look like the following:

       size  nunique
date        
2005      7        3
2006    237       10
2007   3610      227
2008   1318       52
2009   2664      142
2010    997       57
2011   6390      219
2012   2904       99
2013   7875      238
2014   3979      146

Discussions

DataFrame.drop_duplicates raises an exception

System information OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 19.10 Modin version (modin.__version__): 0.7.1+2.g4f36f23 Python version: Python 3.7.5 Code we can use to reproduc... More on github.com

github.com

February 27, 2020

python - Grouping Dataframe by Multiple Columns, and Then Dropping Duplicates - Stack Overflow

I have a dataframe which looks like this (see table). For simplicity sake I've "aapl" is the only ticker shown. However, the real dataframe has more tickers. ticker year return aapl 1999 ... More on stackoverflow.com

stackoverflow.com

python - Pandas 'DataFrame' object has no attribute 'unique' - Stack Overflow

Communities for your favorite technologies. Explore all Collectives · Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work More on stackoverflow.com

stackoverflow.com

python - Pandas 0.13.1 use of groupby( ) with drop_duplicates( ) or dropna ( ) - Stack Overflow

I've just updated from a previous version to Pandas 0.13.1 - happily, this has opened up some options to me. Unhappily, it appears to have caused problems for some of my data wrangling code. I hadn't More on stackoverflow.com

stackoverflow.com

GitHub

github.com › pandas-dev › pandas › issues › 11640

BUG AttributeError: 'DataFrameGroupBy' object has no attribute '_obj_with_exclusions' · Issue #11640 · pandas-dev/pandas

November 18, 2015 - In [5]: df.groupby('a').mean() --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-29-a830c6135818> in <module>() ----> 1 df.groupby('a').mean() /home/nicolas/Git/pandas/pandas/core/groupby.py in mean(self) 764 self._set_selection_from_grouper() 765 f = lambda x: x.mean(axis=self.axis) --> 766 return self._python_agg_general(f) 767 768 def median(self): /home/nicolas/Git/pandas/pandas/core/groupby.py in _python_agg_general(self, func, *args, **kwargs) 1245 output[name] = self._try_cast(values[mask], result)

Author nbonnotte

GitHub

github.com › modin-project › modin › issues › 1115

DataFrame.drop_duplicates raises an exception · Issue #1115 · modin-project/modin

February 27, 2020 - #import pandas as pd import ray ray.init(huge_pages=False, plasma_directory="/localdisk/gashiman/plasma", memory=1024*1024*1024*200, object_store_memory=1024*1024*1024*200) import modin.pandas as pd df = pd.DataFrame([["one", 1, 10], ["two", 4, 20], ["three", 7, 30]], index=[1, 2, 3], columns=['name', 'max_speed', 'health']) print(df) df1 = df.drop_duplicates("name") print(df1)

Author gshimansky

Stack Overflow

stackoverflow.com › questions › 73535769 › grouping-dataframe-by-multiple-columns-and-then-dropping-duplicates

python - Grouping Dataframe by Multiple Columns, and Then Dropping Duplicates - Stack Overflow

Top answer

1 of 3

@QuangHoang provided the simplest version in the comments:

df.drop_duplicates(['ticker', 'year'])

Alternatively, you can use .groupby twice, inside two .applys:

df.groupby("ticker", group_keys=False).apply(lambda x: 
    x.groupby("year", group_keys=False).apply(lambda x: x.drop_duplicates(['year']))
)

Alternatively, you can use the .duplicated function:

df.groupby('ticker', group_keys=False).apply(lambda x: 
    x[~x['year'].duplicated(keep='first')])
)

2 of 3

You can try to sort the values first and then groupby.tail

df.sort_values('return').groupby(['ticker','year']).tail(1)

  ticker  year  return
0   aapl  1999       1
1   aapl  2000       3

Pandas

pandas.pydata.org › pandas-docs › version › 0.17.1 › generated › pandas.DataFrame.drop_duplicates.html

pandas.DataFrame.drop_duplicates — pandas 0.17.1 documentation

Enter search terms or a module, class or function name · Return DataFrame with duplicate rows removed, optionally only considering certain columns

Databricks Community

community.databricks.com › t5 › data-engineering › attributeerror-dataframe-object-has-no-attribute › td-p › 61132

AttributeError: 'DataFrame' object has no attribut... - Databricks Community - 61132

February 19, 2024 - Hello, I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message: "AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'" Here is part of the code: @dl...

Stack Overflow

stackoverflow.com › questions › 29244549 › pandas-dataframe-object-has-no-attribute-unique

python - Pandas 'DataFrame' object has no attribute 'unique' - Stack Overflow

Top answer

1 of 5

DataFrames do not have that method; columns in DataFrames do:

df['A'].unique()

Or, to get the names with the number of observations (using the DataFrame given by closedloop):

>>> df.groupby('person').person.count()
Out[80]: 
person
0         2
1         3
Name: person, dtype: int64

2 of 5

Rather than removing duplicates during the pivot table process, use the df.drop_duplicates() function to selectively drop duplicates.

For example if you are pivoting using these index='c0' and columns='c1' then this simple step yields the correct counts.

In this example the 5th row is a duplicate of the 4th (ignoring the non-pivoted c2 column

import pandas as pd
data = {'c0':[0,1,0,1,1], 'c1':[0,0,1,1,1], 'person':[0,0,1,1,1], 'c_other':[1,2,3,4,5]}
df = pd.DataFrame(data)
df2 = df.drop_duplicates(subset=['c0','c1','person'])
pd.pivot_table(df2, index='c0',columns='c1',values='person', aggfunc='count')

This correctly outputs

Find elsewhere

Google Bing Mojeek

Stack Overflow

stackoverflow.com › questions › 23255003 › pandas-0-13-1-use-of-groupby-with-drop-duplicates-or-dropna › 27179319

python - Pandas 0.13.1 use of groupby( ) with drop_duplicates( ) or dropna ( ) - Stack Overflow

:-). So the reason this did work previously is that just all methods that are available for dataframes, were also available for GroupBy objects, while now only the ones that are listed explicitly in a whitelist (the ones that make sense) ... Solution: basically, to rethink the problem. As noted in my comment, I don't need to use groupby to drop duplicates, I'd just put them in the same line in my previous code.

Stack Overflow

stackoverflow.com › questions › 46534653 › error-attributeerror-dataframegroupby-object-has-no-attribute-while-groupby

python - Error 'AttributeError: 'DataFrameGroupBy' object has no attribute' while groupby functionality on dataframe - Stack Overflow

news_count.groupby(['year','month']).NEWS_SENTIMENT_DAILY_AVG.values.sum() 'AttributeError: 'DataFrameGroupBy' object has no attribute'

Stack Overflow

stackoverflow.com › questions › 70961635 › python-pandas-drop-duplicate-values-from-grouped-data-frame

Python Pandas drop duplicate values from grouped data frame - Stack Overflow

Top answer

1 of 3

You should use the method drop_duplicates from pandas.

The following should solve your problem.

Yours code:


import pandas as pd

id= [2000,2001,2001,3000,2000,3000,3300,3300,3300,3300]
jtitle = ['job1','job2','job1','job3', 'job3', 'job2', 'job5', 'job5', 'job5', 'job6']
date = ['01/01/2021', '17/02/2018','17/02/2021', '01/01/2021', '25/03/2011', '11/11/2000', '22/01/2022', '15/12/2021', '11/11/2021', '10/09/2021']

data= pd.DataFrame(data=zip(id, jtitle, date), columns= ["id", "jtitle", "date"])
# convert to datetime object
data.date = pd.to_datetime(data.date, dayfirst=True)

Solution:

# subset employees by ID, sort by date and drop duplicates
latest = data.sort_values('date', ascending=False).drop_duplicates(subset=['id'], keep='first').copy()

prev_date = data.sort_values('date', ascending=False).drop_duplicates(subset=['id'], keep='last').copy()
    
# calculate the difference in days
latest['days'] = latest['date'].values -  prev_date['date'].values
print(latest)

Output:

  id jtitle       date      days
3300   job5 2022-01-22  134 days
2001   job1 2021-02-17 1096 days
2000   job1 2021-01-01 3570 days
3000   job3 2021-01-01 7356 days

2 of 3

Alternative solution with diff and sum.

data['days'] = data.sort_values('date').groupby('id').date.diff()
data = data.groupby(['id', 'jtitle']).agg({'days': 'sum', 'date': 'first'}).reset_index()

# to filter to only more than 0 days
data[data.days.dt.days > 0]

Result

     id jtitle      days       date
0  2000   job1 3570 days 2021-01-01
1  2001   job1 1096 days 2021-02-17
2  3000   job3 7356 days 2021-01-01
3  3300   job5  134 days 2022-01-22

Streamlit

discuss.streamlit.io › community cloud

Errors after deploying the app - Community Cloud - Streamlit

Top answer

1 of 1

Hi @Kasra :wave: You’ll notice in the pandas docs that version 1.0.5 of pandas does not contain a DataFrameGroupBy.value_counts method. It was added in later versions. For example, version 1.4.1 does contain a DataFrameGroupBy.value_counts method. Pinning a pandas version in requirements.txt that …

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.drop_duplicates.html

pandas.DataFrame.drop_duplicates — pandas 3.0.3 documentation

False : Drop all duplicates.

Pandas

pandas.pydata.org › pandas-docs › version › 1.1 › reference › groupby.html

GroupBy — pandas 1.1.5 documentation

GroupBy objects are returned by groupby calls: pandas.DataFrame.groupby(), pandas.Series.groupby(), etc. The following methods are available in both SeriesGroupBy and DataFrameGroupBy objects, but may differ slightly, usually in that the DataFrameGroupBy version usually permits the specification of an axis argument, and often an argument indicating whether to restrict application to columns of a specific data type.

Stack Overflow

stackoverflow.com › questions › 38235922 › pandas-dataframe-to-dict-fails-after-drop-duplicates › 38235935

python - Pandas DataFrame to_dict fails after drop_duplicates - Stack Overflow

Top answer

1 of 1

I think you miss (), because without the () the drop_duplicates just refers to the function, so df becomes a copy of the function, not the result of executing it (thanks andychase for comment):

df = df.drop_duplicates()
return df.to_dict(orient="records")

Or:

df.drop_duplicates(inplace=True)
return df.to_dict(orient="records")

Stack Overflow

stackoverflow.com › questions › 69056220 › attributeerror-module-pandas-has-no-attribute-drop-duplicates

python - AttributeError: module 'pandas' has no attribute 'drop_duplicates' - Stack Overflow

Top answer

1 of 1

Switch:

result = pd.drop_duplicates(subset='Trade ID', keep="first")

By:

result = result.drop_duplicates(subset='Trade ID', keep="first")

Itsourcecode

itsourcecode.com › home › attributeerror: ‘dataframe’ object has no attribute ‘unique’

Attributeerror: 'dataframe' object has no attribute 'unique'

April 3, 2023 - A DataFrame is returned by this method with duplicate rows removed. In addition, this method can be used to determine the DataFrame with only unique rows. Example: import pandas as pd s_df = pd.DataFrame({'x': [7, 2, 2, 1], 'y': [0, 3, 3, 1]}) unique_df = s_df.drop_duplicates() print(unique_df)

Spark By {Examples}

sparkbyexamples.com › home › pandas › pandas.dataframe.drop_duplicates() – examples

pandas.DataFrame.drop_duplicates() - Examples - Spark By {Examples}

December 10, 2024 - Following is the syntax of the drop_duplicates() function. It takes subset, keep, inplace and ignore_index as params and returns DataFrame with duplicate rows removed based on the parameters passed. If inplace=True is used, it updates the existing DataFrame object and returns None.

Stack Overflow

stackoverflow.com › questions › 47602097 › pandas-groupby-to-to-csv

python - Pandas groupby to to_csv - Stack Overflow

Top answer

1 of 7

Try doing this:

week_grouped = df.groupby('week')
week_grouped.sum().reset_index().to_csv('week_grouped.csv')

That'll write the entire dataframe to the file. If you only want those two columns then,

week_grouped = df.groupby('week')
week_grouped.sum().reset_index()[['week', 'count']].to_csv('week_grouped.csv')

Here's a line by line explanation of the original code:

# This creates a "groupby" object (not a dataframe object) 
# and you store it in the week_grouped variable.
week_grouped = df.groupby('week')

# This instructs pandas to sum up all the numeric type columns in each 
# group. This returns a dataframe where each row is the sum of the 
# group's numeric columns. You're not storing this dataframe in your 
# example.
week_grouped.sum() 

# Here you're calling the to_csv method on a groupby object... but
# that object type doesn't have that method. Dataframes have that method. 
# So we should store the previous line's result (a dataframe) into a variable 
# and then call its to_csv method.
week_grouped.to_csv('week_grouped.csv')

# Like this:
summed_weeks = week_grouped.sum()
summed_weeks.to_csv('...')

# Or with less typing simply
week_grouped.sum().to_csv('...')

2 of 7

Group By returns key, value pairs where key is the identifier of the group and the value is the group itself, i.e. a subset of an original df that matched the key.

In your example week_grouped = df.groupby('week') is set of groups (pandas.core.groupby.DataFrameGroupBy object) which you can explore in detail as follows:

for k, gr in week_grouped:
    # do your stuff instead of print
    print(k)
    print(type(gr)) # This will output <class 'pandas.core.frame.DataFrame'>
    print(gr)
    # You can save each 'gr' in a csv as follows
    gr.to_csv('{}.csv'.format(k))

Or alternatively you can compute aggregation function on your grouped object

result = week_grouped.sum()
# This will be already one row per key and its aggregation result
result.to_csv('result.csv')

In your example you need to assign the function result to some variable as by default pandas objects are immutable.

some_variable = week_grouped.sum() 
some_variable.to_csv('week_grouped.csv') # This will work

basically result.csv and week_grouped.csv are meant to be same

Stack Overflow

stackoverflow.com › questions › tagged › drop-duplicates

Newest 'drop-duplicates' Questions - Stack Overflow

I have a DataFrame that I want to merge and drop only duplicates values based on column name and row. For example, key_x and key_y has the same values in the same row in row 0,3,10,12,15.

1. groupby.head(1)

2. duplicated() is more flexible

3. groupby.sample(1)

1. `groupby.head(1)`

2. `duplicated()` is more flexible

3. `groupby.sample(1)`