col1 col2 col3
0 a 1.0 1.0
1 a 2.0 2.0
2 b 3.0 3.0
3 b NaN 4.0
4 a 5.0 5.0
###Pandas, the compare() function provides a way to compare two DataFrame objects and generate a DataFrame highlighting the differences between them. This can be particularly useful when you have two datasets and want to identify discrepancies or changes between them
##DataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False)
So, let’s understand each of its parameters –
other : This is the first parameter which actually takes the DataFrame object to be compared with the present DataFrame.
align_axis : It deals with the axis(vertical / horizontal) where the comparison is to be made(by default False).0 or index : Here the output of the differences are presented vertically, 1 or columns : The output of the differences are displayed horizontally.
keep_shape : It means that whether we want all the data values to be displayed in the output or only the ones with distinct value. It is of bool type and the default value for it is “false”, i.e. it displays all the values in the table by default.
keep_equal : This is mainly for displaying same or equal values in the output when set to True. If it is made false then it will display the equal values as NANs.
Returns another DataFrame with the differences between the two dataFrames.
Let’s create dataframe and see how compare works.
Let’s create a copy of df dataframe and do some changes in new dataframe df2. Then we will try to use compare() to see the result.
col1 col2 col3
0 c 1.0 1.0
1 a 2.0 2.0
2 b 3.0 4.0
3 b NaN 4.0
4 a 5.0 5.0
| col1 | col3 | |||
|---|---|---|---|---|
| self | other | self | other | |
| 0 | a | c | NaN | NaN |
| 2 | NaN | NaN | 3.0 | 4.0 |
| col1 | col2 | col3 | |
|---|---|---|---|
| 0 | c | 1.0 | 1.0 |
| 1 | a | 100.0 | 2.0 |
| 2 | b | 3.0 | 4.0 |
| 3 | b | NaN | 4.0 |
| 4 | a | 5.0 | 5.0 |
Applying compare() in these two dataframe df and df3. for every column it has created col1, col2 and col3. Each column will have two subcolumn in the output called self and other. when we run df.compare(df3), it will put the values of df in in self and values of df3 in other.
| col1 | col2 | col3 | ||||
|---|---|---|---|---|---|---|
| self | other | self | other | self | other | |
| 0 | a | c | NaN | NaN | NaN | NaN |
| 1 | NaN | NaN | 2.0 | 100.0 | NaN | NaN |
| 2 | NaN | NaN | NaN | NaN | 3.0 | 4.0 |
##Comparing various columns instead of whole dataframe
##Comparing elements of two different columns
0 True
1 True
2 True
3 False
4 True
Name: col2, dtype: bool