Use reset_index():
testdf.reset_index().to_csv('CCCC_output_summary.txt', sep='\t', header=True, index=False)
Answer from Mike Müller on Stack OverflowUse reset_index():
testdf.reset_index().to_csv('CCCC_output_summary.txt', sep='\t', header=True, index=False)
Recently, I had to work with an Excel file that has 2 columns, with headers 'Dog Breed' and 'Dog Name'. I came up with the following code (tested with Python 3.11.0) that uses groupby() and prints the grouped data into a .csv file.
from pathlib import Path
import pandas as pd
p = Path(__file__).with_name('data.xlsx')
q = Path(__file__).with_name('data-grouped.csv')
df = pd.read_excel(p)
groups = df.groupby('Dog Breed', sort=False)
with q.open('w') as foutput:
for g in groups: # For each group
foutput.write(f"{g[0]}, {len(g[1])}") # Record the number of dogs in each group
for e, (index, row) in enumerate(g[1].iterrows()): # Iterating over the group's dataframe
name = str(row['Dog Name'])
if(e == 0):
mystr = f",{name}\n"
else:
mystr = f",,{name}\n"
foutput.write(mystr)
data.xlsx:

data-grouped.csv:

Try doing this:
week_grouped = df.groupby('week')
week_grouped.sum().reset_index().to_csv('week_grouped.csv')
That'll write the entire dataframe to the file. If you only want those two columns then,
week_grouped = df.groupby('week')
week_grouped.sum().reset_index()[['week', 'count']].to_csv('week_grouped.csv')
Here's a line by line explanation of the original code:
# This creates a "groupby" object (not a dataframe object)
# and you store it in the week_grouped variable.
week_grouped = df.groupby('week')
# This instructs pandas to sum up all the numeric type columns in each
# group. This returns a dataframe where each row is the sum of the
# group's numeric columns. You're not storing this dataframe in your
# example.
week_grouped.sum()
# Here you're calling the to_csv method on a groupby object... but
# that object type doesn't have that method. Dataframes have that method.
# So we should store the previous line's result (a dataframe) into a variable
# and then call its to_csv method.
week_grouped.to_csv('week_grouped.csv')
# Like this:
summed_weeks = week_grouped.sum()
summed_weeks.to_csv('...')
# Or with less typing simply
week_grouped.sum().to_csv('...')
Group By returns key, value pairs where key is the identifier of the group and the value is the group itself, i.e. a subset of an original df that matched the key.
In your example week_grouped = df.groupby('week') is set of groups (pandas.core.groupby.DataFrameGroupBy object) which you can explore in detail as follows:
for k, gr in week_grouped:
# do your stuff instead of print
print(k)
print(type(gr)) # This will output <class 'pandas.core.frame.DataFrame'>
print(gr)
# You can save each 'gr' in a csv as follows
gr.to_csv('{}.csv'.format(k))
Or alternatively you can compute aggregation function on your grouped object
result = week_grouped.sum()
# This will be already one row per key and its aggregation result
result.to_csv('result.csv')
In your example you need to assign the function result to some variable as by default pandas objects are immutable.
some_variable = week_grouped.sum()
some_variable.to_csv('week_grouped.csv') # This will work
basically result.csv and week_grouped.csv are meant to be same
See the docs on apply. Pandas will call the function twice on the first group (to determine between a fast/slow code path), so the side effects of the function (IO) will happen twice for the first group.
Your best bet here is probably to iterate over the groups directly, like this:
for group_name, group_df in df.head(1000).groupby('iid'):
item_grouper(group_df)
I agree with chrisb's determination of the problem. As a cleaner way, consider having your user_grouper() function not save any values, but instead return these. With a structure as
def user_grouper(df, ...):
(...)
df['max_tag_count'] = some_calculation
return df
results = df.groupby(...).apply(user_grouper, ...)
for i,row in results.iterrows():
# calculate raw score
raw_score = (tag_counts[row['tag']]-1) / row['max_tag_count']
# write to file
out.write('\t'.join(map(str,[row['uid'],row['iid'],row['tag'],raw_score,weight]))+'\n')
You were almost there
for name, group in grouped:
group.to_csv(path_to_disk)
This answer was very helpful to me - thanks @mkln.
I just wanted to add something specific to my own use case, which relates to the original point about file-naming ('Some Text' + name = group).
You could add the name and additional text, for example current date, to each csv filename, so I will create a function to return the current date and then use this for the filename.
Therefore:
from datetime import datetime
def cur_date():
return datetime.now().strftime("%Y-%m-%d")
for name, group in grouped:
group.to_csv('{}_{}.csv'.format(name, cur_date()))
The problem is that you are trying to apply a function to_csv which doesn't exist. Anyway, groupby also doesn't have a to_csv method. pd.Series and pd.DataFrame do.
What you should really use is drop_duplicates here and then export the resulting dataframe to csv:
df.drop_duplicates(['AA1','AA2']).to_csv('merged.txt')
PS: If you really wanted a groupby solution, there's this one that happens to be 12 times slower than drop_duplicates...:
df.groupby(['AA1','AA2']).agg(lambda x:x.value_counts().index[0]).to_csv('merged.txt')
you can use groupby with head
df.groupby(['AA1', 'AA2']).head(1)
