###Pandas, the compare() function provides a way to compare two DataFrame objects and generate a DataFrame highlighting the differences between them. This can be particularly useful when you have two datasets and want to identify discrepancies or changes between them
other : This is the first parameter which actually takes the DataFrame object to be compared with the present DataFrame.
align_axis : It deals with the axis(vertical / horizontal) where the comparison is to be made(by default False).0 or index : Here the output of the differences are presented vertically, 1 or columns : The output of the differences are displayed horizontally.
keep_shape : It means that whether we want all the data values to be displayed in the output or only the ones with distinct value. It is of bool type and the default value for it is “false”, i.e. it displays all the values in the table by default.
keep_equal : This is mainly for displaying same or equal values in the output when set to True. If it is made false then it will display the equal values as NANs.
Returns another DataFrame with the differences between the two dataFrames.
col1 col2 col3
0 a 1.0 1.0
1 a 2.0 2.0
2 b 3.0 3.0
3 b NaN 4.0
4 a 5.0 5.0
Let’s create a copy of df dataframe and do some changes in new dataframe df2. Then we will try to use compare() to see the result.
# Using Compare without doing any changes in df dataframe and df2 dataframedf2 = df.copy()
df.compare(df2)
#Making some changes in df2df2.loc[0, "col1"] ="c"df2.loc[2, "col3"] =4.0print(df2)
col1 col2 col3
0 c 1.0 1.0
1 a 2.0 2.0
2 b 3.0 4.0
3 b NaN 4.0
4 a 5.0 5.0
#Let's see how compare works.df.compare(df2)
col1
col3
self
other
self
other
0
a
c
NaN
NaN
2
NaN
NaN
3.0
4.0
df.compare(df2, align_axis =0)
col1
col3
0
self
a
NaN
other
c
NaN
2
self
NaN
3.0
other
NaN
4.0
# Let's make changes in all three columns in df2 and see the result of compare()df3 = df.copy()df3.loc[0,"col1"] ="c"df3.loc[1,"col2"] =100df3.loc[2,"col3"] =4.0df3
col1
col2
col3
0
c
1.0
1.0
1
a
100.0
2.0
2
b
3.0
4.0
3
b
NaN
4.0
4
a
5.0
5.0
Applying compare() in these two dataframe df and df3. for every column it has created col1, col2 and col3. Each column will have two subcolumn in the output called self and other. when we run df.compare(df3), it will put the values of df in in self and values of df3 in other.
df.compare(df3)
col1
col2
col3
self
other
self
other
self
other
0
a
c
NaN
NaN
NaN
NaN
1
NaN
NaN
2.0
100.0
NaN
NaN
2
NaN
NaN
NaN
NaN
3.0
4.0
##Comparing various columns instead of whole dataframe