Pandas Introduction
This is Introduction to pandas. Pandas library helps in Data loading , analysis and transformation
Anatomy of a dataframe
Columns, index and data
import pandas as pd
# Create a sample DataFrame
data = {'col1' : [1 , 2 , 3 ], 'col2' : [4 , 5 , 6 ], 'col3' : [7 , 8 , 9 ]}
df = pd.DataFrame(data, index= ['row1' , 'row2' , 'row3' ])
df
row1
1
4
7
row2
2
5
8
row3
3
6
9
# Create a sample DataFrame
data = {'col1' : [1 , 2 , 3 ], 'col2' : [4 , 5 , 6 ], 'col3' : [7 , 8 , 9 ]}
df = pd.DataFrame(data)#, index=['row1', 'row2', 'row3'])
df
# Display columns
df.columns
Index(['col1', 'col2', 'col3'], dtype='object')
Index(['row1', 'row2', 'row3'], dtype='object')
# Display data (values)
df.values
array([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]])
Creating Dataframe
From lists of lists
From series objects
From dictionaries
# From lists of lists
list_of_lists_data = [[1 , 2 , 3 ], [4 , 5 , 6 ], [7 , 8 , 9 ]]
df_from_list = pd.DataFrame(list_of_lists_data, columns= ['Age' , 'B' , 'C' ])
df_from_list
# From series objects
series_data = {'col1' : pd.Series([1 , 2 , 3 ]),
'col2' : pd.Series([4 , 5 , 6 ])}
df_from_series = pd.DataFrame(series_data)
print (" \n DataFrame from series objects:" )
df_from_series
DataFrame from series objects:
# From dictionaries
dictionary_data = {
'col1' : [1 , 2 , 3 ],
'col2' : [4 , 5 , 6 ]
}
df_from_dict = pd.DataFrame(dictionary_data)
print (" \n DataFrame from dictionaries:" )
df_from_dict
DataFrame from dictionaries:
Loading Datasets
load data from csv file
load data from json file
load data from html file
df_csv = pd.read_csv("/content/sample_data/california_housing_test.csv" )
df_csv
0
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
1
-118.30
34.26
43.0
1510.0
310.0
809.0
277.0
3.5990
176500.0
2
-117.81
33.78
27.0
3589.0
507.0
1484.0
495.0
5.7934
270500.0
3
-118.36
33.82
28.0
67.0
15.0
49.0
11.0
6.1359
330000.0
4
-119.67
36.33
19.0
1241.0
244.0
850.0
237.0
2.9375
81700.0
...
...
...
...
...
...
...
...
...
...
2995
-119.86
34.42
23.0
1450.0
642.0
1258.0
607.0
1.1790
225000.0
2996
-118.14
34.06
27.0
5257.0
1082.0
3496.0
1036.0
3.3906
237200.0
2997
-119.70
36.30
10.0
956.0
201.0
693.0
220.0
2.2895
62000.0
2998
-117.12
34.10
40.0
96.0
14.0
46.0
14.0
3.2708
162500.0
2999
-119.63
34.42
42.0
1765.0
263.0
753.0
260.0
8.5608
500001.0
3000 rows × 9 columns
df_json = pd.read_json("/content/sample_data/anscombe.json" )
df_json
0
I
10
8.04
1
I
8
6.95
2
I
13
7.58
3
I
9
8.81
4
I
11
8.33
5
I
14
9.96
6
I
6
7.24
7
I
4
4.26
8
I
12
10.84
9
I
7
4.81
10
I
5
5.68
11
II
10
9.14
12
II
8
8.14
13
II
13
8.74
14
II
9
8.77
15
II
11
9.26
16
II
14
8.10
17
II
6
6.13
18
II
4
3.10
19
II
12
9.13
20
II
7
7.26
21
II
5
4.74
22
III
10
7.46
23
III
8
6.77
24
III
13
12.74
25
III
9
7.11
26
III
11
7.81
27
III
14
8.84
28
III
6
6.08
29
III
4
5.39
30
III
12
8.15
31
III
7
6.42
32
III
5
5.73
33
IV
8
6.58
34
IV
8
5.76
35
IV
8
7.71
36
IV
8
8.84
37
IV
8
8.47
38
IV
8
7.04
39
IV
8
5.25
40
IV
19
12.50
41
IV
8
5.56
42
IV
8
7.91
43
IV
8
6.89
import pandas as pd
# URL of the Wikipedia page
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)'
# Read the tables from the URL
tables = pd.read_html(url)
# Assuming the first table is the one we want
tables[0 ]
print ("DataFrame loaded from Wikipedia table:" )
tables[0 ]
DataFrame loaded from Wikipedia table:
0
World
8021407192
8091734930
+0.88%
–
–
1
India
1425423212
1438069596
+0.89%
Asia
Southern Asia
2
China[a]
1425179569
1422584933
−0.18%
Asia
Eastern Asia
3
United States
341534046
343477335
+0.57%
Americas
Northern America
4
Indonesia
278830529
281190067
+0.85%
Asia
South-eastern Asia
...
...
...
...
...
...
...
233
Montserrat (United Kingdom)
4453
4420
−0.74%
Americas
Caribbean
234
Falkland Islands (United Kingdom)
3490
3477
−0.37%
Americas
South America
235
Tokelau (New Zealand)
2290
2397
+4.67%
Oceania
Polynesia
236
Niue (New Zealand)
1821
1817
−0.22%
Oceania
Polynesia
237
Vatican City[w]
505
496
−1.78%
Europe
Southern Europe
238 rows × 6 columns
0
Global
Current population United Nations Demographics...
1
Continents/subregions
Africa Antarctica Asia Europe North America Ca...
2
Intercontinental
Americas Arab world Commonwealth of Nations Eu...
3
Cities/urban areas
World cities National capitals Megacities Mega...
4
Past and future
Past and future population Estimates of histor...
5
Population density
Current density Past and future population den...
6
Growth indicators
Population growth rate Natural increase Net re...
7
Life expectancy
World Africa Asia Europe North America Oceania...
8
Other demographics
Age at childbearing Age at first marriage Age ...
9
Health
Antidepressant consumption Antiviral medicatio...
10
Education and innovation
Bloomberg Innovation Index Education Index Glo...
11
Economic
Access to financial services Development aid d...
12
List of international rankings Lists by country
List of international rankings Lists by country
Datatypes of columns
int
float
category
object
longitude
float64
latitude
float64
housing_median_age
float64
total_rooms
float64
total_bedrooms
float64
population
float64
households
float64
median_income
float64
median_house_value
float64
dtype: object
Country or territory
object
Population (1 July 2022)
int64
Population (1 July 2023)
int64
Change (%)
object
UN continental region[1]
object
UN statistical subregion[1]
object
dtype: object
Summarizing dataframes
describe
missing values
value_counts() for categorical columns
count
3000.000000
3000.00000
3000.000000
3000.000000
3000.000000
3000.000000
3000.00000
3000.000000
3000.00000
mean
-119.589200
35.63539
28.845333
2599.578667
529.950667
1402.798667
489.91200
3.807272
205846.27500
std
1.994936
2.12967
12.555396
2155.593332
415.654368
1030.543012
365.42271
1.854512
113119.68747
min
-124.180000
32.56000
1.000000
6.000000
2.000000
5.000000
2.00000
0.499900
22500.00000
25%
-121.810000
33.93000
18.000000
1401.000000
291.000000
780.000000
273.00000
2.544000
121200.00000
50%
-118.485000
34.27000
29.000000
2106.000000
437.000000
1155.000000
409.50000
3.487150
177650.00000
75%
-118.020000
37.69000
37.000000
3129.000000
636.000000
1742.750000
597.25000
4.656475
263975.00000
max
-114.490000
41.92000
52.000000
30450.000000
5419.000000
11935.000000
4930.00000
15.000100
500001.00000
longitude
0
latitude
0
housing_median_age
0
total_rooms
0
total_bedrooms
0
population
0
households
0
median_income
0
median_house_value
0
dtype: int64
tables[0 ]["UN continental region[1]" ].value_counts()
UN continental region[1]
Africa
58
Americas
55
Asia
51
Europe
50
Oceania
23
–
1
dtype: int64
Selecting data
Select Column(s)
Select row(s)
Select columns and rows
Conditional selection
Special selection methods
Order columns
#Select Column(s)
df[['longitude' , 'latitude' ]]
0
-122.05
37.37
1
-118.30
34.26
2
-117.81
33.78
3
-118.36
33.82
4
-119.67
36.33
...
...
...
2995
-119.86
34.42
2996
-118.14
34.06
2997
-119.70
36.30
2998
-117.12
34.10
2999
-119.63
34.42
3000 rows × 2 columns
# Select row(s)
df.iloc[0 :3 ]
0
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
1
-118.30
34.26
43.0
1510.0
310.0
809.0
277.0
3.5990
176500.0
2
-117.81
33.78
27.0
3589.0
507.0
1484.0
495.0
5.7934
270500.0
# Select columns and rows
df.iloc[0 :3 , 0 :2 ] # rows , columns
0
-122.05
37.37
1
-118.30
34.26
2
-117.81
33.78
# Select columns and rows
df.loc[0 :3 , ['longitude' , 'latitude' ]]
0
-122.05
37.37
1
-118.30
34.26
2
-117.81
33.78
3
-118.36
33.82
# boolean indexing
df['population' ] > 1000
0
True
1
False
2
True
3
False
4
False
...
...
2995
True
2996
True
2997
False
2998
False
2999
False
3000 rows × 1 columns
dtype: bool
# Conditional selection
df[df['population' ] > 1000 ]
0
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
2
-117.81
33.78
27.0
3589.0
507.0
1484.0
495.0
5.7934
270500.0
7
-120.65
35.48
19.0
2310.0
471.0
1341.0
441.0
3.2250
166900.0
8
-122.84
38.40
15.0
3080.0
617.0
1446.0
599.0
3.6696
194400.0
9
-118.02
34.08
31.0
2402.0
632.0
2830.0
603.0
2.3333
164200.0
...
...
...
...
...
...
...
...
...
...
2988
-122.01
36.97
43.0
2162.0
509.0
1208.0
464.0
2.5417
260900.0
2989
-122.02
37.60
32.0
1295.0
295.0
1097.0
328.0
3.2386
149600.0
2990
-118.23
34.09
49.0
1638.0
456.0
1500.0
430.0
2.6923
150000.0
2995
-119.86
34.42
23.0
1450.0
642.0
1258.0
607.0
1.1790
225000.0
2996
-118.14
34.06
27.0
5257.0
1082.0
3496.0
1036.0
3.3906
237200.0
1798 rows × 9 columns
# Special selection methods
# query
df.query('population > 1000' )
0
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
2
-117.81
33.78
27.0
3589.0
507.0
1484.0
495.0
5.7934
270500.0
7
-120.65
35.48
19.0
2310.0
471.0
1341.0
441.0
3.2250
166900.0
8
-122.84
38.40
15.0
3080.0
617.0
1446.0
599.0
3.6696
194400.0
9
-118.02
34.08
31.0
2402.0
632.0
2830.0
603.0
2.3333
164200.0
...
...
...
...
...
...
...
...
...
...
2988
-122.01
36.97
43.0
2162.0
509.0
1208.0
464.0
2.5417
260900.0
2989
-122.02
37.60
32.0
1295.0
295.0
1097.0
328.0
3.2386
149600.0
2990
-118.23
34.09
49.0
1638.0
456.0
1500.0
430.0
2.6923
150000.0
2995
-119.86
34.42
23.0
1450.0
642.0
1258.0
607.0
1.1790
225000.0
2996
-118.14
34.06
27.0
5257.0
1082.0
3496.0
1036.0
3.3906
237200.0
1798 rows × 9 columns
df.query('population > 1000 and housing_median_age < 10' )
33
-118.08
34.55
5.0
16181.0
2971.0
8152.0
2651.0
4.5237
141800.0
45
-117.24
33.17
4.0
9998.0
1874.0
3925.0
1672.0
4.2826
237500.0
93
-117.50
33.87
4.0
6755.0
1017.0
2866.0
850.0
5.0493
239800.0
153
-118.38
34.27
8.0
3248.0
847.0
2608.0
731.0
2.8214
158300.0
182
-122.24
37.55
3.0
6164.0
1175.0
2198.0
975.0
6.7413
435900.0
...
...
...
...
...
...
...
...
...
...
2899
-121.92
38.02
8.0
2750.0
479.0
1526.0
484.0
5.1020
156500.0
2913
-122.39
37.78
3.0
3464.0
1179.0
1441.0
919.0
4.7105
275000.0
2930
-121.84
37.29
4.0
2937.0
648.0
1780.0
665.0
4.3851
160400.0
2936
-119.75
36.87
3.0
13802.0
2244.0
5226.0
1972.0
5.0941
143700.0
2969
-118.11
34.68
6.0
7430.0
1184.0
3489.0
1115.0
5.3267
140100.0
139 rows × 9 columns
df.query('population > 1000 or housing_median_age < 10' )
0
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
2
-117.81
33.78
27.0
3589.0
507.0
1484.0
495.0
5.7934
270500.0
7
-120.65
35.48
19.0
2310.0
471.0
1341.0
441.0
3.2250
166900.0
8
-122.84
38.40
15.0
3080.0
617.0
1446.0
599.0
3.6696
194400.0
9
-118.02
34.08
31.0
2402.0
632.0
2830.0
603.0
2.3333
164200.0
...
...
...
...
...
...
...
...
...
...
2988
-122.01
36.97
43.0
2162.0
509.0
1208.0
464.0
2.5417
260900.0
2989
-122.02
37.60
32.0
1295.0
295.0
1097.0
328.0
3.2386
149600.0
2990
-118.23
34.09
49.0
1638.0
456.0
1500.0
430.0
2.6923
150000.0
2995
-119.86
34.42
23.0
1450.0
642.0
1258.0
607.0
1.1790
225000.0
2996
-118.14
34.06
27.0
5257.0
1082.0
3496.0
1036.0
3.3906
237200.0
1843 rows × 9 columns
df.select_dtypes(include= ['number' ]) # Selects all numeric columns
0
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
1
-118.30
34.26
43.0
1510.0
310.0
809.0
277.0
3.5990
176500.0
2
-117.81
33.78
27.0
3589.0
507.0
1484.0
495.0
5.7934
270500.0
3
-118.36
33.82
28.0
67.0
15.0
49.0
11.0
6.1359
330000.0
4
-119.67
36.33
19.0
1241.0
244.0
850.0
237.0
2.9375
81700.0
...
...
...
...
...
...
...
...
...
...
2995
-119.86
34.42
23.0
1450.0
642.0
1258.0
607.0
1.1790
225000.0
2996
-118.14
34.06
27.0
5257.0
1082.0
3496.0
1036.0
3.3906
237200.0
2997
-119.70
36.30
10.0
956.0
201.0
693.0
220.0
2.2895
62000.0
2998
-117.12
34.10
40.0
96.0
14.0
46.0
14.0
3.2708
162500.0
2999
-119.63
34.42
42.0
1765.0
263.0
753.0
260.0
8.5608
500001.0
3000 rows × 9 columns
df.select_dtypes(include= ['int64' , 'float64' ]) # Selects specific numeric types
0
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
1
-118.30
34.26
43.0
1510.0
310.0
809.0
277.0
3.5990
176500.0
2
-117.81
33.78
27.0
3589.0
507.0
1484.0
495.0
5.7934
270500.0
3
-118.36
33.82
28.0
67.0
15.0
49.0
11.0
6.1359
330000.0
4
-119.67
36.33
19.0
1241.0
244.0
850.0
237.0
2.9375
81700.0
...
...
...
...
...
...
...
...
...
...
2995
-119.86
34.42
23.0
1450.0
642.0
1258.0
607.0
1.1790
225000.0
2996
-118.14
34.06
27.0
5257.0
1082.0
3496.0
1036.0
3.3906
237200.0
2997
-119.70
36.30
10.0
956.0
201.0
693.0
220.0
2.2895
62000.0
2998
-117.12
34.10
40.0
96.0
14.0
46.0
14.0
3.2708
162500.0
2999
-119.63
34.42
42.0
1765.0
263.0
753.0
260.0
8.5608
500001.0
3000 rows × 9 columns
tables[0 ].select_dtypes(exclude= ["number" ]).dtypes
Country or territory
object
Change (%)
object
UN continental region[1]
object
UN statistical subregion[1]
object
dtype: object
334
-118.13
34.01
45.0
1179.0
268.0
736.0
252.0
2.7083
161800.0
649
-122.27
37.80
39.0
1715.0
623.0
1327.0
467.0
1.8477
179200.0
604
-121.93
38.01
9.0
2294.0
389.0
1142.0
365.0
5.3363
160800.0
716
-121.17
37.97
28.0
1374.0
248.0
769.0
229.0
3.6389
130400.0
2981
-120.66
35.49
17.0
4422.0
945.0
2307.0
885.0
2.8285
171300.0
Mathematical operations on series
addition, subtraction, multiplication and division
by constant
by another series
df['housing_median_age' ].head()
0
27.0
1
43.0
2
27.0
3
28.0
4
19.0
dtype: float64
df['housing_median_age' ] * 2
0
54.0
1
86.0
2
54.0
3
56.0
4
38.0
...
...
2995
46.0
2996
54.0
2997
20.0
2998
80.0
2999
84.0
3000 rows × 1 columns
dtype: float64
df["total_rooms" ] + df["total_bedrooms" ]
0
4546.0
1
1820.0
2
4096.0
3
82.0
4
1485.0
...
...
2995
2092.0
2996
6339.0
2997
1157.0
2998
110.0
2999
2028.0
3000 rows × 1 columns
dtype: float64
Creating, renaming
create rows and columns
rename columns
df["extra column" ] = df["total_rooms" ] + df["total_bedrooms" ]
df.head()
0
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
4546.0
1
-118.30
34.26
43.0
1510.0
310.0
809.0
277.0
3.5990
176500.0
1820.0
2
-117.81
33.78
27.0
3589.0
507.0
1484.0
495.0
5.7934
270500.0
4096.0
3
-118.36
33.82
28.0
67.0
15.0
49.0
11.0
6.1359
330000.0
82.0
4
-119.67
36.33
19.0
1241.0
244.0
850.0
237.0
2.9375
81700.0
1485.0
# Create a new row using loc
df_new = pd.DataFrame()
for i in range (10 ):
df_new.loc[i,"new_col" ] = (i+ 2 )/ 2
df_new
0
1.0
1
1.5
2
2.0
3
2.5
4
3.0
5
3.5
6
4.0
7
4.5
8
5.0
9
5.5
0
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
4546.0
1
-118.30
34.26
43.0
1510.0
310.0
809.0
277.0
3.5990
176500.0
1820.0
2
-117.81
33.78
27.0
3589.0
507.0
1484.0
495.0
5.7934
270500.0
4096.0
3
-118.36
33.82
28.0
67.0
15.0
49.0
11.0
6.1359
330000.0
82.0
4
-119.67
36.33
19.0
1241.0
244.0
850.0
237.0
2.9375
81700.0
1485.0
...
...
...
...
...
...
...
...
...
...
...
2995
-119.86
34.42
23.0
1450.0
642.0
1258.0
607.0
1.1790
225000.0
2092.0
2996
-118.14
34.06
27.0
5257.0
1082.0
3496.0
1036.0
3.3906
237200.0
6339.0
2997
-119.70
36.30
10.0
956.0
201.0
693.0
220.0
2.2895
62000.0
1157.0
2998
-117.12
34.10
40.0
96.0
14.0
46.0
14.0
3.2708
162500.0
110.0
2999
-119.63
34.42
42.0
1765.0
263.0
753.0
260.0
8.5608
500001.0
2028.0
3000 rows × 10 columns
# Create a new row using loc
df.loc[3000 , "longitude" ] = 30
df
0
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
4546.0
1
-118.30
34.26
43.0
1510.0
310.0
809.0
277.0
3.5990
176500.0
1820.0
2
-117.81
33.78
27.0
3589.0
507.0
1484.0
495.0
5.7934
270500.0
4096.0
3
-118.36
33.82
28.0
67.0
15.0
49.0
11.0
6.1359
330000.0
82.0
4
-119.67
36.33
19.0
1241.0
244.0
850.0
237.0
2.9375
81700.0
1485.0
...
...
...
...
...
...
...
...
...
...
...
2996
-118.14
34.06
27.0
5257.0
1082.0
3496.0
1036.0
3.3906
237200.0
6339.0
2997
-119.70
36.30
10.0
956.0
201.0
693.0
220.0
2.2895
62000.0
1157.0
2998
-117.12
34.10
40.0
96.0
14.0
46.0
14.0
3.2708
162500.0
110.0
2999
-119.63
34.42
42.0
1765.0
263.0
753.0
260.0
8.5608
500001.0
2028.0
3000
30.00
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
3001 rows × 10 columns
longitude
0
latitude
1
housing_median_age
1
total_rooms
1
total_bedrooms
1
population
1
households
1
median_income
1
median_house_value
1
extra column
1
dtype: int64
longitude
-122.0500
latitude
37.3700
housing_median_age
27.0000
total_rooms
3885.0000
total_bedrooms
661.0000
population
1537.0000
households
606.0000
median_income
6.6085
median_house_value
344700.0000
extra column
4546.0000
dtype: float64
# Create a new row using loc
df.loc[3001 ] = a_row
df
0
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
4546.0
1
-118.30
34.26
43.0
1510.0
310.0
809.0
277.0
3.5990
176500.0
1820.0
2
-117.81
33.78
27.0
3589.0
507.0
1484.0
495.0
5.7934
270500.0
4096.0
3
-118.36
33.82
28.0
67.0
15.0
49.0
11.0
6.1359
330000.0
82.0
4
-119.67
36.33
19.0
1241.0
244.0
850.0
237.0
2.9375
81700.0
1485.0
...
...
...
...
...
...
...
...
...
...
...
2997
-119.70
36.30
10.0
956.0
201.0
693.0
220.0
2.2895
62000.0
1157.0
2998
-117.12
34.10
40.0
96.0
14.0
46.0
14.0
3.2708
162500.0
110.0
2999
-119.63
34.42
42.0
1765.0
263.0
753.0
260.0
8.5608
500001.0
2028.0
3000
30.00
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
3001
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
4546.0
3002 rows × 10 columns
# Rename columns
df.rename(columns= {'extra column' : 'sum_column' })
0
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
4546.0
1
-118.30
34.26
43.0
1510.0
310.0
809.0
277.0
3.5990
176500.0
1820.0
2
-117.81
33.78
27.0
3589.0
507.0
1484.0
495.0
5.7934
270500.0
4096.0
3
-118.36
33.82
28.0
67.0
15.0
49.0
11.0
6.1359
330000.0
82.0
4
-119.67
36.33
19.0
1241.0
244.0
850.0
237.0
2.9375
81700.0
1485.0
...
...
...
...
...
...
...
...
...
...
...
2997
-119.70
36.30
10.0
956.0
201.0
693.0
220.0
2.2895
62000.0
1157.0
2998
-117.12
34.10
40.0
96.0
14.0
46.0
14.0
3.2708
162500.0
110.0
2999
-119.63
34.42
42.0
1765.0
263.0
753.0
260.0
8.5608
500001.0
2028.0
3000
30.00
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
3001
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
4546.0
3002 rows × 10 columns
# Rename columns
df.rename(columns= {'longitude' : 'long' , 'latitude' : 'lat' })
0
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
4546.0
1
-118.30
34.26
43.0
1510.0
310.0
809.0
277.0
3.5990
176500.0
1820.0
2
-117.81
33.78
27.0
3589.0
507.0
1484.0
495.0
5.7934
270500.0
4096.0
3
-118.36
33.82
28.0
67.0
15.0
49.0
11.0
6.1359
330000.0
82.0
4
-119.67
36.33
19.0
1241.0
244.0
850.0
237.0
2.9375
81700.0
1485.0
...
...
...
...
...
...
...
...
...
...
...
2997
-119.70
36.30
10.0
956.0
201.0
693.0
220.0
2.2895
62000.0
1157.0
2998
-117.12
34.10
40.0
96.0
14.0
46.0
14.0
3.2708
162500.0
110.0
2999
-119.63
34.42
42.0
1765.0
263.0
753.0
260.0
8.5608
500001.0
2028.0
3000
30.00
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
3001
-122.05
37.37
27.0
3885.0
661.0
1537.0
606.0
6.6085
344700.0
4546.0
3002 rows × 10 columns
Index(['longitude', 'latitude', 'housing_median_age', 'total_rooms',
'total_bedrooms', 'population', 'households', 'median_income',
'median_house_value', 'extra column'],
dtype='object')
# groupby
# pivot
# melting