You also need to be careful to create a copy of the DataFrame, otherwise the csvdata_old will be updated with csvdata (since it points to the same object):
csvdata_old = csvdata.copy()
To check whether they are equal, you can use assert_frame_equal as in this answer:
from pandas.util.testing import assert_frame_equal
assert_frame_equal(csvdata, csvdata_old)
You can wrap this in a function with something like:
try:
assert_frame_equal(csvdata, csvdata_old)
return True
except: # appeantly AssertionError doesn't catch all
return False
There was discussion of a better way...
Answer from Andy Hayden on Stack OverflowYou also need to be careful to create a copy of the DataFrame, otherwise the csvdata_old will be updated with csvdata (since it points to the same object):
csvdata_old = csvdata.copy()
To check whether they are equal, you can use assert_frame_equal as in this answer:
from pandas.util.testing import assert_frame_equal
assert_frame_equal(csvdata, csvdata_old)
You can wrap this in a function with something like:
try:
assert_frame_equal(csvdata, csvdata_old)
return True
except: # appeantly AssertionError doesn't catch all
return False
There was discussion of a better way...
Not sure if this is helpful or not, but I whipped together this quick python method for returning just the differences between two dataframes that both have the same columns and shape.
def get_different_rows(source_df, new_df):
"""Returns just the rows from the new dataframe that differ from the source dataframe"""
merged_df = source_df.merge(new_df, indicator=True, how='outer')
changed_rows_df = merged_df[merged_df['_merge'] == 'right_only']
return changed_rows_df.drop('_merge', axis=1)
Is this the most efficient way to compare 2 Pandas dataframes? Finding rows unique to one dataframe.
How to compare 2 pandas dataframes that have the same column/on the same column?
Pandas: Compare two dataframes that are different lengths
How to test if all values in pandas dataframe column are equal?
Videos
Still trying to wrap my head around Pandas (and continuing to be blown away by its capabilities every day...
Say I have 2 dataframes (lets call the left_test_df and right_test_df). And, I want to see which rows are only present in one of them. Is something like this the best way to go about doing it?
merged_df=pd.merge(left_test_df, right_test_df, how='right', indicator=True)
final_df=merged_df[merged_df['_merge']=='right_only']