A pivot table is a powerful data summarization tool used in data analysis. It allows you to reorganize and summarize tabular data in a flexible manner, providing a compact representation of complex data relationships. Pivot tables enable you to aggregate and visualize data based on one or more key variables, making it easier to identify patterns and trends within your dataset.
#Creating a Pivot Table in Pandas
In pandas, you can create a pivot table using the pivot_table() function. Let’s go through the basic syntax:
import pandas as pdimport numpy as np# Create a DataFramedata = {'Date': ['2022-01-01', '2022-01-01', '2022-01-02', '2022-01-02'],'Category': ['A', 'B', 'A', 'B'],'Value': [10, 20, 30, 40]}df = pd.DataFrame(data)# Create a pivot tablepivot_table = df.pivot_table(values='Value', index='Date', columns='Category', aggfunc='sum')print(pivot_table)
Category A B
Date
2022-01-01 10 20
2022-01-02 30 40
In this example:
values: The column to aggregate. index: The column to use as the row index in the pivot table. columns: The column to use as the column index in the pivot table. aggfunc: The aggregation function to apply when multiple values are present for the same index/column pair. It defaults to ‘mean’.
data = {}np.random.seed(2)for i in [chr(x) for x inrange(65,70)]: data['col'+i] = np.random.randint(1,100,10)data['orderID'] = np.random.choice(['A', 'B', 'C'], 10)data['product'] = np.random.choice(['Product1', 'Product2', 'Product3'], 10)data['customer'] = np.random.choice(['Customer1', 'Customer2', 'Customer3', 'Customer4'], 10)df = pd.DataFrame(data)
product Product1 Product2 Product3
orderID
A 26.0 NaN NaN
B NaN 57.0 52.0
C 35.0 16.0 76.0
#Operations with Pivot Tables
Aggregation Functions You can specify different aggregation functions when creating a pivot table. Common aggregation functions include ‘sum’, ‘mean’, ‘count’, ‘max’, and ‘min’. For example:
##Handling Missing Values. You can specify how missing values are handled using the fill_value parameter:
# Pivot table with fill value for missing combinationspivot_table5 = df.pivot_table(values='colA', index='orderID', columns='product', aggfunc=np.mean, fill_value=0)pivot_table5