The apply() method is one of the most common methods of data preprocessing. It simplifies applying a function on each element in a pandas Series and each row or column in a pandas DataFrame. In this tutorial, we’ll learn how to use the apply() method in pandas — you’ll need to know the fundamentals of Python and lambda functions. If you aren’t familiar with these or need to brush up your Python skills, you might like to try our free Python Fundamentals course.
series form the basis of pandas. They are just one-dimensional arrays with axis labels called indices.
There are different ways of creating a Series object (e.g., we can initialize a Series with lists or dictionaries). Let’s define a Series object with two lists containing student names as indices and their heights in centimeters as data:
import pandas as pdimport numpy as npfrom IPython.display import displaystudents = pd.Series(data=[180, 175, 168, 190], index=['A', 'B', 'C', 'D'])display(students)print(type(students))
A 180
B 175
C 168
D 190
dtype: int64
<class 'pandas.core.series.Series'>
The code above returns the content of the students object and its data type.
The data type of the students object is Series, so we can apply any functions on its data using the apply() method. Let’s see how we can convert the heights of the students from centimeters to feet:
Vik 5.91
Mehdi 5.74
Bella 5.51
Chriss 6.23
dtype: float64
The students’ heights are converted to feet with two decimal places. To do so, we first defined a function that does the conversion, then pass the function name without parentheses to the apply() method. The apply() method takes each element in the Series and applies the cm_to_feet() function on it.
###In this section, we’ll work on dummy requests initiated by the company’s HR team. We’ll learn how to use the apply() method by going through different scenarios. We’ll explore a new use case in each scenario and solve it using the apply() method.
Scenario 1 Let’s assume that the HR team wants to send an invitation email that starts with a friendly greeting to all the employees (e.g., Hey, Sarah!). They asked you to create two columns for storing the employees’ first and last names separately, making referring to the employees’ first names easy. To do so, we can use a lambda function that splits a string into a list after breaking it by the specified separator; the default separator character of the split() method is any white space. Let’s look at the code:
data1['FirstName'] = data1['EmployeeName'].apply(lambda x : x.split()[0])data1['LastName'] = data1['EmployeeName'].apply(lambda x : x.split()[1])display(data1)
EmployeeName
Department
HireDate
Sex
Birthdate
Weight
Height
Kids
FirstName
LastName
0
Callen Dunkley
Accounting
2010
M
04/09/1982
78
176
2
Callen
Dunkley
1
Sarah Rayner
Engineering
2018
F
14/04/1981
80
160
1
Sarah
Rayner
2
Jeanette Sloan
Engineering
2012
F
06/05/1997
66
169
0
Jeanette
Sloan
3
Kaycee Acosta
HR
2014
F
08/01/1986
67
157
1
Kaycee
Acosta
4
Henri Conroy
HR
2014
M
10/10/1988
90
185
1
Henri
Conroy
5
Emma Peralta
HR
2018
F
12/11/1992
57
164
0
Emma
Peralta
6
Martin Butt
Data Science
2020
M
10/04/1991
115
195
2
Martin
Butt
7
Alex Jensen
Data Science
2018
M
16/07/1995
87
180
0
Alex
Jensen
8
Kim Howarth
Accounting
2020
M
08/10/1992
95
174
3
Kim
Howarth
9
Jane Burnett
Data Science
2012
F
11/10/1979
57
165
1
Jane
Burnett
In the code above, we applied the lambda function on the EmployeeName column, which is technically a Series object. The lambda function splits the employees’ full names into first and last names. Thus, the code creates two more columns that contain the first and last names of employees.
##Transforming the values in the ‘Departure Time’ and ‘Arrival Time’ columns to represent the hour component. For instance, if an entry is 10:05, the corresponding value should be 10. ##Then converting the time into four categories as follows: ##5 <= hour < 12 = Morning ##12 <= hour < 17 = Afternoon ##17 <= hour < 20 = Evening ##20 <= hour < 5 = Night
def hr(time): time = time.split(':') hour =int(time[0])if5<= hour and hour <12:return'Morning'if12<= hour and hour <17:return'Afternoon'if17<= hour and hour <20:return'Evening'if20<= hour or hour <5:return'Night'
#Mapping Values using a Function: In this example, we’ll use a function to map existing values in a Series to new values.
# Define a function to map ages to age groupsdef map_age(age):if age <30:return'Young'elif age >=30and age <40:return'Middle-aged'else:return'Senior'# Map values in 'Age' column using the functiondf_map['Age Group'] = df_map['Age'].map(map_age)# Display DataFramedf_map