You also need to be careful to create a copy of the DataFrame, otherwise the csvdata_old will be updated with csvdata (since it points to the same object):
csvdata_old = csvdata.copy()
To check whether they are equal, you can use assert_frame_equal as in this answer:
from pandas.util.testing import assert_frame_equal
assert_frame_equal(csvdata, csvdata_old)
You can wrap this in a function with something like:
try:
assert_frame_equal(csvdata, csvdata_old)
return True
except: # appeantly AssertionError doesn't catch all
return False
There was discussion of a better way...
Answer from Andy Hayden on Stack OverflowVideos
You also need to be careful to create a copy of the DataFrame, otherwise the csvdata_old will be updated with csvdata (since it points to the same object):
csvdata_old = csvdata.copy()
To check whether they are equal, you can use assert_frame_equal as in this answer:
from pandas.util.testing import assert_frame_equal
assert_frame_equal(csvdata, csvdata_old)
You can wrap this in a function with something like:
try:
assert_frame_equal(csvdata, csvdata_old)
return True
except: # appeantly AssertionError doesn't catch all
return False
There was discussion of a better way...
Not sure if this is helpful or not, but I whipped together this quick python method for returning just the differences between two dataframes that both have the same columns and shape.
def get_different_rows(source_df, new_df):
"""Returns just the rows from the new dataframe that differ from the source dataframe"""
merged_df = source_df.merge(new_df, indicator=True, how='outer')
changed_rows_df = merged_df[merged_df['_merge'] == 'right_only']
return changed_rows_df.drop('_merge', axis=1)
This again is a subtle one, well done for spotting it.
import pandas as pd
df_1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df_2 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
r = pd.DataFrame({'a': ['x'], 'b': ['y']})
df_1 = df_1.append(r, ignore_index=True)
df_1 = pd.concat([df_1, r]).drop_duplicates(keep=False)
df_1.equals(df_2)
from pandas.util.testing import assert_frame_equal
assert_frame_equal(df_1,df_2)
Now we can see the issue as the assert fails.
AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
Attribute "dtype" are different
[left]: object
[right]: int64
as you added strings to integers the integers became objects. so this is why the equals fails as well..
Use pandas.testing.assert_frame_equal(df_1, df_2, check_dtype=True), which will also check if the dtypes are the same.
(It will pick up in this case that your dtypes changed from int to 'object' (string) when you appended, then deleted, a string row; pandas did not automatically coerce the dtype back down to less expansive dtype.)
AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
Attribute "dtype" are different
[left]: object
[right]: int64