Introduction to Pandas
🐼 🐼 🐼
import pandas as pd
import matplotlib.pyplot as plt
The pandas
package gives us several useful tools for managing data.
! head world-population.csv
LocID,Location,VarID,Variant,Time,MidPeriod,PopMale,PopFemale,PopTotal,PopDensity
900,World,2,Medium,1950,1950.5,1266259.556,1270171.462,2536431.018,19.497
900,World,2,Medium,1951,1951.5,1290237.638,1293796.589,2584034.227,19.863
900,World,2,Medium,1952,1952.5,1313854.565,1317007.125,2630861.69,20.223
900,World,2,Medium,1953,1953.5,1337452.786,1340156.275,2677609.061,20.582
900,World,2,Medium,1954,1954.5,1361313.834,1363532.92,2724846.754,20.945
900,World,2,Medium,1955,1955.5,1385658.299,1387361.616,2773019.915,21.316
900,World,2,Medium,1956,1956.5,1410643.208,1411800.046,2822443.254,21.695
900,World,2,Medium,1957,1957.5,1436364.62,1436941.438,2873306.058,22.086
900,World,2,Medium,1958,1958.5,1462864.322,1462822.358,2925686.68,22.489
world_pop = pd.read_csv("world-population.csv")
world_pop
LocID | Location | VarID | Variant | Time | MidPeriod | PopMale | PopFemale | PopTotal | PopDensity | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 900 | World | 2 | Medium | 1950 | 1950.5 | 1266259.556 | 1270171.462 | 2.536431e+06 | 19.497 |
1 | 900 | World | 2 | Medium | 1951 | 1951.5 | 1290237.638 | 1293796.589 | 2.584034e+06 | 19.863 |
2 | 900 | World | 2 | Medium | 1952 | 1952.5 | 1313854.565 | 1317007.125 | 2.630862e+06 | 20.223 |
3 | 900 | World | 2 | Medium | 1953 | 1953.5 | 1337452.786 | 1340156.275 | 2.677609e+06 | 20.582 |
4 | 900 | World | 2 | Medium | 1954 | 1954.5 | 1361313.834 | 1363532.920 | 2.724847e+06 | 20.945 |
… | … | … | … | … | … | … | … | … | … | … |
146 | 900 | World | 2 | Medium | 2096 | 2096.5 | 5439132.293 | 5418979.294 | 1.085811e+07 | 83.464 |
147 | 900 | World | 2 | Medium | 2097 | 2097.5 | 5441365.752 | 5422249.024 | 1.086361e+07 | 83.506 |
148 | 900 | World | 2 | Medium | 2098 | 2098.5 | 5443228.989 | 5425118.647 | 1.086835e+07 | 83.542 |
149 | 900 | World | 2 | Medium | 2099 | 2099.5 | 5444712.816 | 5427571.318 | 1.087228e+07 | 83.573 |
150 | 900 | World | 2 | Medium | 2100 | 2100.5 | 5445805.463 | 5429588.256 | 1.087539e+07 | 83.596 |
151 rows × 10 columns
type(world_pop)
pandas.core.frame.DataFrame
pd.read_csv
returns a DataFrame object.
It stores the data as a dictionary of columns.
world_pop.head()
LocID | Location | VarID | Variant | Time | MidPeriod | PopMale | PopFemale | PopTotal | PopDensity | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 900 | World | 2 | Medium | 1950 | 1950.5 | 1266259.556 | 1270171.462 | 2536431.018 | 19.497 |
1 | 900 | World | 2 | Medium | 1951 | 1951.5 | 1290237.638 | 1293796.589 | 2584034.227 | 19.863 |
2 | 900 | World | 2 | Medium | 1952 | 1952.5 | 1313854.565 | 1317007.125 | 2630861.690 | 20.223 |
3 | 900 | World | 2 | Medium | 1953 | 1953.5 | 1337452.786 | 1340156.275 | 2677609.061 | 20.582 |
4 | 900 | World | 2 | Medium | 1954 | 1954.5 | 1361313.834 | 1363532.920 | 2724846.754 | 20.945 |
world_pop.tail()
LocID | Location | VarID | Variant | Time | MidPeriod | PopMale | PopFemale | PopTotal | PopDensity | |
---|---|---|---|---|---|---|---|---|---|---|
146 | 900 | World | 2 | Medium | 2096 | 2096.5 | 5439132.293 | 5418979.294 | 1.085811e+07 | 83.464 |
147 | 900 | World | 2 | Medium | 2097 | 2097.5 | 5441365.752 | 5422249.024 | 1.086361e+07 | 83.506 |
148 | 900 | World | 2 | Medium | 2098 | 2098.5 | 5443228.989 | 5425118.647 | 1.086835e+07 | 83.542 |
149 | 900 | World | 2 | Medium | 2099 | 2099.5 | 5444712.816 | 5427571.318 | 1.087228e+07 | 83.573 |
150 | 900 | World | 2 | Medium | 2100 | 2100.5 | 5445805.463 | 5429588.256 | 1.087539e+07 | 83.596 |
You can get a whole column:
world_pop['PopTotal']
0 2.536431e+06
1 2.584034e+06
2 2.630862e+06
3 2.677609e+06
4 2.724847e+06
...
146 1.085811e+07
147 1.086361e+07
148 1.086835e+07
149 1.087228e+07
150 1.087539e+07
Name: PopTotal, Length: 151, dtype: float64
You can get individual records:
world_pop.loc[1]
LocID 900
Location World
VarID 2
Variant Medium
Time 1951
MidPeriod 1951.5
PopMale 1290237.638
PopFemale 1293796.589
PopTotal 2584034.227
PopDensity 19.863
Name: 1, dtype: object
You can look at just the columns you want:
world_pop[['Time', 'PopTotal', 'PopMale', 'PopFemale']].head()
Time | PopTotal | PopMale | PopFemale | |
---|---|---|---|---|
0 | 1950 | 2536431.018 | 1266259.556 | 1270171.462 |
1 | 1951 | 2584034.227 | 1290237.638 | 1293796.589 |
2 | 1952 | 2630861.690 | 1313854.565 | 1317007.125 |
3 | 1953 | 2677609.061 | 1337452.786 | 1340156.275 |
4 | 1954 | 2724846.754 | 1361313.834 | 1363532.920 |
You can plot the data!
plt.plot(world_pop['Time'], world_pop['PopTotal'])
[<matplotlib.lines.Line2D at 0x12a2c9130>]
world_pop.plot(x="Time", y="PopTotal")
<AxesSubplot:xlabel='Time'>
world_pop.plot(x="Time", y=["PopMale","PopFemale","PopTotal"])
<AxesSubplot:xlabel='Time'>
You can create new columns:
world_pop["PopTotalB"] = world_pop["PopTotal"] / 1e6
world_pop.head()
LocID | Location | VarID | Variant | Time | MidPeriod | PopMale | PopFemale | PopTotal | PopDensity | PopTotalB | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 900 | World | 2 | Medium | 1950 | 1950.5 | 1266259.556 | 1270171.462 | 2536431.018 | 19.497 | 2.536431 |
1 | 900 | World | 2 | Medium | 1951 | 1951.5 | 1290237.638 | 1293796.589 | 2584034.227 | 19.863 | 2.584034 |
2 | 900 | World | 2 | Medium | 1952 | 1952.5 | 1313854.565 | 1317007.125 | 2630861.690 | 20.223 | 2.630862 |
3 | 900 | World | 2 | Medium | 1953 | 1953.5 | 1337452.786 | 1340156.275 | 2677609.061 | 20.582 | 2.677609 |
4 | 900 | World | 2 | Medium | 1954 | 1954.5 | 1361313.834 | 1363532.920 | 2724846.754 | 20.945 | 2.724847 |
world_pop.plot(
x='Time',
y="PopTotalB",
ylabel="Population (Billions)",
title="Predicted Population for World"
)
<AxesSubplot:title={'center':'Predicted Population for World'}, xlabel='Time', ylabel='Population (Billions)'>
🌎 🌍 🌏
world_pop = pd.read_csv("WPP2019_TotalPopulationBySex.csv")
world_pop.head()
LocID | Location | VarID | Variant | Time | MidPeriod | PopMale | PopFemale | PopTotal | PopDensity | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 4 | Afghanistan | 2 | Medium | 1950 | 1950.5 | 4099.243 | 3652.874 | 7752.117 | 11.874 |
1 | 4 | Afghanistan | 2 | Medium | 1951 | 1951.5 | 4134.756 | 3705.395 | 7840.151 | 12.009 |
2 | 4 | Afghanistan | 2 | Medium | 1952 | 1952.5 | 4174.450 | 3761.546 | 7935.996 | 12.156 |
3 | 4 | Afghanistan | 2 | Medium | 1953 | 1953.5 | 4218.336 | 3821.348 | 8039.684 | 12.315 |
4 | 4 | Afghanistan | 2 | Medium | 1954 | 1954.5 | 4266.484 | 3884.832 | 8151.316 | 12.486 |
world_pop["Location"].unique()
array(['Afghanistan', 'Africa', 'African Group', 'African Union',
'African Union: Central Africa', 'African Union: Eastern Africa',
'African Union: Northern Africa', 'African Union: Southern Africa',
'African Union: Western Africa',
'African, Caribbean and Pacific (ACP) Group of States', 'Albania',
'Algeria', 'American Samoa', 'Andean Community', 'Andorra',
'Angola', 'Anguilla', 'Antigua and Barbuda', 'Argentina',
'Armenia', 'Aruba', 'Asia',
'Asia-Pacific Economic Cooperation (APEC)', 'Asia-Pacific Group',
'Association of Southeast Asian Nations (ASEAN)', 'Australia',
'Australia/New Zealand', 'Austria', 'Azerbaijan', 'BRIC', 'BRICS',
'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus',
'Belgium', 'Belize', 'Belt-Road Initiative (BRI)',
'Belt-Road Initiative: Africa', 'Belt-Road Initiative: Asia',
'Belt-Road Initiative: Europe',
'Belt-Road Initiative: Latin America and the Caribbean',
'Belt-Road Initiative: Pacific', 'Benin', 'Bermuda', 'Bhutan',
'Black Sea Economic Cooperation (BSEC)',
'Bolivarian Alliance for the Americas (ALBA)',
'Bolivia (Plurinational State of)',
'Bonaire, Sint Eustatius and Saba', 'Bosnia and Herzegovina',
'Botswana', 'Brazil', 'British Virgin Islands',
'Brunei Darussalam', 'Bulgaria', 'Burkina Faso', 'Burundi',
'Cabo Verde', 'Cambodia', 'Cameroon', 'Canada', 'Caribbean',
'Caribbean Community and Common Market (CARICOM)',
'Cayman Islands', 'Central African Republic', 'Central America',
'Central Asia', 'Central European Free Trade Agreement (CEFTA)',
'Central and Southern Asia', 'Chad', 'Channel Islands', 'Chile',
'China', 'China (and dependencies)', 'China, Hong Kong SAR',
'China, Macao SAR', 'China, Taiwan Province of China', 'Colombia',
'Commonwealth of Independent States (CIS)',
'Commonwealth of Nations', 'Commonwealth: Africa',
'Commonwealth: Asia', 'Commonwealth: Caribbean and Americas',
'Commonwealth: Europe', 'Commonwealth: Pacific', 'Comoros',
'Congo', 'Cook Islands', 'Costa Rica',
'Countries with Access to the Sea',
'Countries with Access to the Sea: Africa',
'Countries with Access to the Sea: Asia',
'Countries with Access to the Sea: Europe',
'Countries with Access to the Sea: Latin America and the Caribbean',
'Countries with Access to the Sea: Northern America',
'Countries with Access to the Sea: Oceania', 'Croatia', 'Cuba',
'Curaçao', 'Cyprus', 'Czechia', "Côte d'Ivoire",
"Dem. People's Republic of Korea",
'Democratic Republic of the Congo', 'Denmark',
'Denmark (and dependencies)', 'Djibouti', 'Dominica',
'Dominican Republic', 'ECE: North America-2', 'ECE: UNECE-52',
'ECLAC: Latin America', 'ECLAC: The Caribbean',
'ESCAP region: East and North-East Asia',
'ESCAP region: North and Central Asia', 'ESCAP region: Pacific',
'ESCAP region: South and South-West Asia',
'ESCAP region: South-East Asia',
'ESCAP: ADB Developing member countries (DMCs)',
'ESCAP: ADB Group A (Concessional assistance\xa0only)',
'ESCAP: ADB Group B\xa0(OCR blend)',
'ESCAP: ADB Group C (Regular OCR only)', 'ESCAP: ASEAN',
'ESCAP: Central Asia', 'ESCAP: ECO', 'ESCAP: HDI groups',
'ESCAP: Landlocked countries (LLDCs)',
'ESCAP: Least Developed Countries (LDCs)',
'ESCAP: Pacific island dev. econ.', 'ESCAP: SAARC',
'ESCAP: WB High income econ.', 'ESCAP: WB Low income econ.',
'ESCAP: WB Lower middle income econ.',
'ESCAP: WB Upper middle income econ.', 'ESCAP: WB income groups',
'ESCAP: high HDI', 'ESCAP: high income', 'ESCAP: income groups',
'ESCAP: low HDI', 'ESCAP: low income', 'ESCAP: lower middle HDI',
'ESCAP: lower middle income',
'ESCAP: other Asia-Pacific countries/areas',
'ESCAP: upper middle HDI', 'ESCAP: upper middle income',
'ESCWA: Arab countries', 'ESCWA: Arab least developed countries',
'ESCWA: Gulf Cooperation Council countries',
'ESCWA: Maghreb countries', 'ESCWA: Mashreq countries',
'ESCWA: member countries', 'East African Community (EAC)',
'Eastern Africa', 'Eastern Asia', 'Eastern Europe',
'Eastern European Group', 'Eastern and South-Eastern Asia',
'Economic Community of Central African States (ECCAS)',
'Economic Community of West African States (ECOWAS)',
'Economic Cooperation Organization (ECO)', 'Ecuador', 'Egypt',
'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia',
'Eswatini', 'Ethiopia', 'Eurasian Economic Community (Eurasec)',
'Europe', 'Europe (48)', 'Europe and Northern America',
'European Community (EC: 12)',
'European Free Trade Agreement (EFTA)', 'European Union (EU: 15)',
'European Union (EU: 28)', 'Falkland Islands (Malvinas)',
'Faroe Islands', 'Fiji', 'Finland', 'France',
'France (and dependencies)', 'French Guiana', 'French Polynesia',
'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar',
'Greater Arab Free Trade Area (GAFTA)', 'Greece', 'Greenland',
'Grenada', 'Group of 77 (G77)', 'Group of Eight (G8)',
'Group of Seven (G7)', 'Group of Twenty (G20) - member states',
'Guadeloupe', 'Guam', 'Guatemala', 'Guinea', 'Guinea-Bissau',
'Gulf Cooperation Council (GCC)', 'Guyana', 'Haiti',
'High-income countries', 'Holy See', 'Honduras', 'Hungary',
'Iceland', 'India', 'Indonesia', 'Iran (Islamic Republic of)',
'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica',
'Japan', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Kuwait',
'Kyrgyzstan', 'LLDC: Africa', 'LLDC: Asia', 'LLDC: Europe',
'LLDC: Latin America', 'Land-locked Countries',
'Land-locked Countries (Others)',
'Land-locked Developing Countries (LLDC)',
"Lao People's Democratic Republic",
'Latin America and the Caribbean',
'Latin American Integration Association (ALADI)',
'Latin American and Caribbean Group (GRULAC)', 'Latvia',
'League of Arab States (LAS, informal name: Arab League)',
'Least developed countries', 'Least developed: Africa',
'Least developed: Asia',
'Least developed: Latin America and the Caribbean',
'Least developed: Oceania', 'Lebanon', 'Lesotho',
'Less developed regions',
'Less developed regions, excluding China',
'Less developed regions, excluding least developed countries',
'Less developed: Africa', 'Less developed: Asia',
'Less developed: Latin America and the Caribbean',
'Less developed: Oceania', 'Liberia', 'Libya', 'Liechtenstein',
'Lithuania', 'Low-income countries',
'Lower-middle-income countries', 'Luxembourg', 'Madagascar',
'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta',
'Marshall Islands', 'Martinique', 'Mauritania', 'Mauritius',
'Mayotte', 'Melanesia', 'Mexico', 'Micronesia',
'Micronesia (Fed. States of)', 'Middle Africa',
'Middle-income countries', 'Monaco', 'Mongolia', 'Montenegro',
'Montserrat', 'More developed regions', 'More developed: Asia',
'More developed: Europe', 'More developed: Northern America',
'More developed: Oceania', 'Morocco', 'Mozambique', 'Myanmar',
'Namibia', 'Nauru', 'Nepal', 'Netherlands',
'Netherlands (and dependencies)', 'New Caledonia',
'New EU member states (joined since 2004)', 'New Zealand',
'New Zealand (and dependencies)', 'Nicaragua', 'Niger', 'Nigeria',
'Niue', 'No income group available',
'Non-Self-Governing Territories',
'North American Free Trade Agreement (NAFTA)',
'North Atlantic Treaty Organization (NATO)', 'North Macedonia',
'Northern Africa', 'Northern Africa and Western Asia',
'Northern America', 'Northern Europe', 'Northern Mariana Islands',
'Norway', 'Oceania',
'Oceania (excluding Australia and New Zealand)', 'Oman',
'Organisation for Economic Co-operation and Development (OECD)',
'Organization for Security and Co-operation in Europe (OSCE)',
'Organization of American States (OAS)',
'Organization of Petroleum Exporting countries (OPEC)',
'Organization of the Islamic Conference (OIC)', 'Pakistan',
'Palau', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru',
'Philippines', 'Poland', 'Polynesia', 'Portugal', 'Puerto Rico',
'Qatar', 'Republic of Korea', 'Republic of Moldova', 'Romania',
'Russian Federation', 'Rwanda', 'Réunion',
'SIDS Atlantic, and Indian Ocean, Mediterranean and South China Sea (AIMS)',
'SIDS Caribbean', 'SIDS Pacific', 'Saint Barthélemy',
'Saint Helena', 'Saint Kitts and Nevis', 'Saint Lucia',
'Saint Martin (French part)', 'Saint Pierre and Miquelon',
'Saint Vincent and the Grenadines', 'Samoa', 'San Marino',
'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia',
'Seychelles', 'Shanghai Cooperation Organization (SCO)',
'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)',
'Slovakia', 'Slovenia', 'Small Island Developing States (SIDS)',
'Solomon Islands', 'Somalia', 'South Africa', 'South America',
'South Asian Association for Regional Cooperation (SAARC)',
'South Sudan', 'South-Eastern Asia', 'Southern Africa',
'Southern African Development Community (SADC)', 'Southern Asia',
'Southern Common Market (MERCOSUR)', 'Southern Europe', 'Spain',
'Sri Lanka', 'State of Palestine', 'Sub-Saharan Africa', 'Sudan',
'Suriname', 'Sweden', 'Switzerland', 'Syrian Arab Republic',
'Tajikistan', 'Thailand', 'Timor-Leste', 'Togo', 'Tokelau',
'Tonga', 'Trinidad and Tobago', 'Tunisia', 'Turkey',
'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu',
'UN-ECE: member countries', 'UNFPA Regions',
'UNFPA: Arab States (AS)', 'UNFPA: Asia and the Pacific (AP)',
'UNFPA: East and Southern Africa (ESA)',
'UNFPA: Eastern Europe and Central Asia (EECA)',
'UNFPA: Latin America and the Caribbean (LAC)',
'UNFPA: West and Central Africa (WCA)', 'UNICEF PROGRAMME REGIONS',
'UNICEF Programme Regions: East Asia and Pacific (EAPRO)',
'UNICEF Programme Regions: Eastern Caribbean',
'UNICEF Programme Regions: Eastern and Southern Africa (ESARO)',
'UNICEF Programme Regions: Europe and Central Asia (CEECIS)',
'UNICEF Programme Regions: Latin America',
'UNICEF Programme Regions: Latin America and Caribbean (LACRO)',
'UNICEF Programme Regions: Middle East and North Africa (MENARO)',
'UNICEF Programme Regions: South Asia (ROSA)',
'UNICEF Programme Regions: West and Central Africa (WCARO)',
'UNICEF REGIONS', 'UNICEF Regions: East Asia and Pacific',
'UNICEF Regions: Eastern Europe and Central Asia',
'UNICEF Regions: Eastern and Southern Africa',
'UNICEF Regions: Europe and Central Asia',
'UNICEF Regions: Latin America and Caribbean',
'UNICEF Regions: Middle East and North Africa',
'UNICEF Regions: North America', 'UNICEF Regions: South Asia',
'UNICEF Regions: Sub-Saharan Africa',
'UNICEF Regions: West and Central Africa',
'UNICEF Regions: Western Europe',
'UNITED NATIONS Regional Groups of Member States', 'Uganda',
'Ukraine', 'United Arab Emirates', 'United Kingdom',
'United Kingdom (and dependencies)',
'United Nations Economic Commission for Africa (UN-ECA)',
'United Nations Economic Commission for Latin America and the Caribbean (UN-ECLAC)',
'United Nations Economic and Social Commission for Asia and the Pacific (UN-ESCAP) Regions',
'United Nations Member States', 'United Republic of Tanzania',
'United States Virgin Islands', 'United States of America',
'United States of America (and dependencies)',
'Upper-middle-income countries', 'Uruguay', 'Uzbekistan',
'Vanuatu', 'Venezuela (Bolivarian Republic of)', 'Viet Nam',
'WB region: East Asia and Pacific (excluding high income)',
'WB region: Europe and Central Asia (excluding high income)',
'WB region: Latin America and Caribbean (excluding high income)',
'WB region: Middle East and North Africa (excluding high income)',
'WB region: South Asia (excluding high income)',
'WB region: Sub-Saharan Africa (excluding high income)',
'WHO Regions', 'WHO: African region (AFRO)',
'WHO: Americas (AMRO)', 'WHO: Eastern Mediterranean Region (EMRO)',
'WHO: European Region (EURO)',
'WHO: South-East Asia region (SEARO)',
'WHO: Western Pacific region (WPRO)', 'Wallis and Futuna Islands',
'West African Economic and Monetary Union (UEMOA)',
'Western Africa', 'Western Asia', 'Western Europe',
'Western European and Others Group (WEOG)', 'Western Sahara',
'World', 'World Bank Regional Groups (developing only)', 'Yemen',
'Zambia', 'Zimbabwe'], dtype=object)
place = "Costa Rica"
place_pop = world_pop[ world_pop["Location"].eq(place) & world_pop["Variant"].eq("Medium") ]
place_pop.plot(x="Time", y="PopTotal", title=place)
<AxesSubplot:title={'center':'Costa Rica'}, xlabel='Time'>
world_pop["Location"].eq(place).head()
0 False
1 False
2 False
3 False
4 False
Name: Location, dtype: bool
world_pop["Location"].eq(place).sum()
884
import random
places = random.sample(list(world_pop["Location"].unique()), 5)
for place in places:
place_pop = world_pop[ world_pop["Location"].eq(place) & world_pop["Variant"].eq("Medium") ]
place_pop.plot(x="Time", y="PopTotal", title=place)
🎥
movie_data = pd.read_csv("all-time-worldwide-box-office.csv")
movie_data.head()
Rank | Year | Movie | WorldwideBox Office | DomesticBox Office | InternationalBox Office | |
---|---|---|---|---|---|---|
0 | 1 | 2009 | Avatar | 760,507,625 | $2,085,391,916 | |
1 | 2 | 2019 | Avengers: Endgame | 858,373,000 | $1,939,427,564 | |
2 | 3 | 1997 | Titanic | 659,363,944 | $1,548,622,601 | |
3 | 4 | 2015 | Star Wars Ep. VII: The Force Awakens | 936,662,225 | $1,127,953,592 | |
4 | 5 | 2018 | Avengers: Infinity War | 678,815,482 | $1,365,725,041 |
def clean_money(entry):
if isinstance(entry, float):
return 0
return int(entry.strip('$').replace(',', ''))
movie_data['Worldwide'] = movie_data['WorldwideBox Office'].apply(clean_money)
movie_data['Domestic'] = movie_data['DomesticBox Office'].apply(clean_money)
movie_data['International'] = movie_data['InternationalBox Office'].apply(clean_money)
movie_data['Worldwide'].describe()
count 5.950000e+02
mean 5.249021e+08
std 3.130981e+08
min 2.578052e+08
25% 3.220422e+08
50% 4.110029e+08
75% 6.308023e+08
max 2.845900e+09
Name: Worldwide, dtype: float64
movie_data[ movie_data['Year'].eq(2019) ]
Rank | Year | Movie | WorldwideBox Office | DomesticBox Office | InternationalBox Office | Worldwide | Domestic | International | |
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 2019 | Avengers: Endgame | 858,373,000 | $1,939,427,564 | 2797800564 | 858373000 | 1939427564 | |
6 | 7 | 2019 | The Lion King | 543,638,043 | $1,110,729,382 | 1654367425 | 543638043 | 1110729382 | |
9 | 10 | 2019 | Frozen II | 477,373,578 | $969,551,818 | 1446925396 | 477373578 | 969551818 | |
23 | 24 | 2019 | Spider-Man: Far From Home | 390,532,085 | $740,580,981 | 1131113066 | 390532085 | 740580981 | |
24 | 25 | 2019 | Captain Marvel | 426,829,839 | $702,897,549 | 1129727388 | 426829839 | 702897549 | |
30 | 31 | 2019 | Toy Story 4 | 434,038,008 | $639,042,321 | 1073080329 | 434038008 | 639042321 | |
31 | 32 | 2019 | Star Wars: The Rise of Skywalker | 515,202,542 | $557,645,945 | 1072848487 | 515202542 | 557645945 | |
32 | 33 | 2019 | Joker | 335,451,311 | $737,056,206 | 1072507517 | 335451311 | 737056206 | |
36 | 37 | 2019 | Aladdin | 355,559,216 | $691,090,490 | 1046649706 | 355559216 | 691090490 | |
86 | 87 | 2019 | Jumanji: The Next Level | 316,831,246 | $483,297,391 | 800128637 | 316831246 | 483297391 | |
100 | 102 | 2019 | Fast & Furious Presents: Hobbs & Shaw | 173,956,935 | $586,424,771 | 760381706 | 173956935 | 586424771 | |
109 | 111 | 2019 | Ne Zha Zhi Mo Tong Jiang Shi | 3,695,533 | $739,015,718 | 742711251 | 3695533 | 739015718 | |
113 | 115 | 2019 | Liu Lang Di Qiu | 5,875,487 | $721,970,963 | 727846450 | 5875487 | 721970963 | |
203 | 206 | 2019 | How to Train Your Dragon: The Hidden … | 160,799,505 | $361,946,858 | 522746363 | 160799505 | 361946858 | |
223 | 226 | 2019 | Maleficent: Mistress of Evil | 113,929,605 | $375,416,890 | 489346495 | 113929605 | 375416890 | |
239 | 242 | 2019 | It: Chapter Two | 211,593,228 | $257,973,578 | 469566806 | 211593228 | 257973578 | |
245 | 248 | 2019 | Wo he wo de zu guo | 2,323,409 | $463,094,610 | 465418019 | 2323409 | 463094610 | |
263 | 266 | 2019 | The Secret Life of Pets 2 | 158,874,395 | $287,398,620 | 446273015 | 158874395 | 287398620 | |
278 | 281 | 2019 | Pokémon: Detective Pikachu | 144,105,346 | $287,842,212 | 431947558 | 144105346 | 287842212 | |
289 | 292 | 2019 | Zhongguo jizhang | 706,572 | $417,143,798 | 417850370 | 706572 | 417143798 | |
308 | 312 | 2019 | Alita: Battle Angel | 85,838,210 | $316,061,830 | 401900040 | 85838210 | 316061830 | |
328 | 332 | 2019 | Godzilla: King of the Monsters | 110,500,138 | $272,799,777 | 383299915 | 110500138 | 272799777 | |
334 | 338 | 2019 | Once Upon a Timeâ¦in Hollywood | 142,502,728 | $235,059,311 | 377562039 | 142502728 | 235059311 | |
354 | 358 | 2019 | 1917 | 159,227,644 | $207,387,077 | 366614721 | 159227644 | 207387077 | |
361 | 365 | 2019 | Shazam! | 140,371,656 | $223,292,877 | 363664533 | 140371656 | 223292877 | |
387 | 391 | 2019 | Dumbo | 114,766,307 | $238,400,000 | 353166307 | 114766307 | 238400000 | |
440 | 445 | 2019 | Feng Kuang De Wai Xing Ren | 326,150,303 | 326150303 | 0 | 326150303 | ||
442 | 447 | 2019 | John Wick: Chapter 3 â Parabellum | 171,016,727 | $154,658,268 | 325674995 | 171016727 | 154658268 | |
462 | 467 | 2019 | Knives Out | 165,364,060 | $147,588,118 | 312952178 | 165364060 | 147588118 |
columns_of_interest = ["Rank","Year","Movie","DomesticBox Office","InternationalBox Office"]
movie_data[ movie_data['Year'].eq(2019) ][columns_of_interest]
Rank | Year | Movie | DomesticBox Office | InternationalBox Office | |
---|---|---|---|---|---|
1 | 2 | 2019 | Avengers: Endgame | 1,939,427,564 | |
6 | 7 | 2019 | The Lion King | 1,110,729,382 | |
9 | 10 | 2019 | Frozen II | 969,551,818 | |
23 | 24 | 2019 | Spider-Man: Far From Home | 740,580,981 | |
24 | 25 | 2019 | Captain Marvel | 702,897,549 | |
30 | 31 | 2019 | Toy Story 4 | 639,042,321 | |
31 | 32 | 2019 | Star Wars: The Rise of Skywalker | 557,645,945 | |
32 | 33 | 2019 | Joker | 737,056,206 | |
36 | 37 | 2019 | Aladdin | 691,090,490 | |
86 | 87 | 2019 | Jumanji: The Next Level | 483,297,391 | |
100 | 102 | 2019 | Fast & Furious Presents: Hobbs & Shaw | 586,424,771 | |
109 | 111 | 2019 | Ne Zha Zhi Mo Tong Jiang Shi | 739,015,718 | |
113 | 115 | 2019 | Liu Lang Di Qiu | 721,970,963 | |
203 | 206 | 2019 | How to Train Your Dragon: The Hidden … | 361,946,858 | |
223 | 226 | 2019 | Maleficent: Mistress of Evil | 375,416,890 | |
239 | 242 | 2019 | It: Chapter Two | 257,973,578 | |
245 | 248 | 2019 | Wo he wo de zu guo | 463,094,610 | |
263 | 266 | 2019 | The Secret Life of Pets 2 | 287,398,620 | |
278 | 281 | 2019 | Pokémon: Detective Pikachu | 287,842,212 | |
289 | 292 | 2019 | Zhongguo jizhang | 417,143,798 | |
308 | 312 | 2019 | Alita: Battle Angel | 316,061,830 | |
328 | 332 | 2019 | Godzilla: King of the Monsters | 272,799,777 | |
334 | 338 | 2019 | Once Upon a Timeâ¦in Hollywood | 235,059,311 | |
354 | 358 | 2019 | 1917 | 207,387,077 | |
361 | 365 | 2019 | Shazam! | 223,292,877 | |
387 | 391 | 2019 | Dumbo | 238,400,000 | |
440 | 445 | 2019 | Feng Kuang De Wai Xing Ren | NaN | $326,150,303 |
442 | 447 | 2019 | John Wick: Chapter 3 â Parabellum | 154,658,268 | |
462 | 467 | 2019 | Knives Out | 147,588,118 |
internationals = movie_data[ movie_data["International"] / movie_data["Domestic"] > 5 ]
internationals[columns_of_interest]
Rank | Year | Movie | DomesticBox Office | InternationalBox Office | |
---|---|---|---|---|---|
77 | 78 | 2021 | Ni Hao, Li Huan Ying | NaN | $841,716,512 |
79 | 80 | 2017 | Zhan Lang 2 | 832,753,071 | |
109 | 111 | 2019 | Ne Zha Zhi Mo Tong Jiang Shi | 739,015,718 | |
113 | 115 | 2019 | Liu Lang Di Qiu | 721,970,963 | |
121 | 123 | 2021 | Tang Ren Jie Tan An 3 | NaN | $698,994,069 |
188 | 190 | 2018 | Tang Ren Jie Tan An 2 | 542,084,590 | |
193 | 195 | 2018 | Hong Hai Xing Dong | 532,057,988 | |
202 | 205 | 2016 | Mei Ren Yu | 521,789,022 | |
228 | 231 | 2012 | Les Intouchables | 471,448,627 | |
231 | 234 | 2020 | Kimetsu no Yaiba: Mugen Ressha-Hen | 430,226,595 | |
236 | 239 | 2020 | Ba bai | NaN | $472,643,830 |
245 | 248 | 2019 | Wo he wo de zu guo | 463,094,610 | |
257 | 260 | 2018 | Wo Bi Shi Yao Shen | 451,176,640 | |
269 | 272 | 2016 | Warcraft | 391,534,534 | |
274 | 277 | 2020 | Wo He Wo De Jia Xiang | NaN | $433,241,288 |
289 | 292 | 2019 | Zhongguo jizhang | 417,143,798 | |
307 | 311 | 2016 | Ice Age: Collision Course | 338,093,674 | |
326 | 330 | 2001 | Sen to Chihiro no Kamikakushi | 373,636,205 | |
351 | 355 | 2018 | Xi Hong Shi Shou Fu | NaN | $366,961,920 |
368 | 372 | 2020 | Tenet | 303,946,852 | |
370 | 374 | 2018 | Zhuo yao ji 2 | 360,977,662 | |
372 | 376 | 2015 | Zhuo yao ji | 359,262,825 | |
377 | 381 | 2016 | Kimi no na wa | 352,065,557 | |
408 | 413 | 2017 | xXx: Return of Xander Cage | 300,134,946 | |
422 | 427 | 2017 | Never Say Die | NaN | $334,536,622 |
423 | 428 | 2016 | The Great Wall | 289,329,747 | |
440 | 445 | 2019 | Feng Kuang De Wai Xing Ren | NaN | $326,150,303 |
460 | 465 | 2016 | Resident Evil: The Final Chapter | 287,256,498 | |
477 | 482 | 2017 | Qian Ren 3: Yan Zhi Da Zuo Zhan | 306,710,033 | |
485 | 490 | 2016 | Dangal | 292,757,700 | |
577 | 583 | 2004 | Bridget Jones: The Edge Of Reason | 223,691,531 | |
580 | 586 | 2017 | Baahubali 2: The Conclusion | 244,517,120 |
domestics = movie_data[ movie_data["International"] / movie_data["Domestic"] < 0.7 ]
domestics[columns_of_interest]
Rank | Year | Movie | DomesticBox Office | InternationalBox Office | |
---|---|---|---|---|---|
97 | 98 | 1977 | Star Wars Ep. IV: A New Hope | 314,400,000 | |
128 | 130 | 2012 | The Hunger Games | 269,912,687 | |
153 | 155 | 2004 | The Passion of the Christ | 251,530,705 | |
187 | 189 | 2014 | American Sniper | 197,200,000 | |
232 | 235 | 1990 | Home Alone | 190,923,432 | |
234 | 237 | 1983 | Star Wars Ep. VI: Return of the Jedi | 165,901,098 | |
244 | 247 | 2009 | The Hangover | 188,165,080 | |
296 | 299 | 1989 | Batman | 160,160,000 | |
319 | 323 | 2009 | Star Trek | 129,109,595 | |
338 | 342 | 2002 | My Big Fat Greek Wedding | 133,451,826 | |
347 | 351 | 1981 | Raiders of the Lost Ark | 141,766,000 | |
365 | 369 | 2007 | Alvin and the Chipmunks | 145,278,059 | |
396 | 400 | 2012 | Dr. Seussâ The Lorax | 136,946,253 | |
404 | 409 | 2001 | Rush Hour 2 | 121,261,546 | |
407 | 412 | 2000 | How the Grinch Stole Christmas | 85,096,578 | |
455 | 460 | 1984 | Beverly Hills Cop | 81,539,522 | |
464 | 469 | 2004 | The Polar Express | 124,140,582 | |
465 | 470 | 1999 | Austin Powers: The Spy Who Shagged Me | 106,343,402 | |
482 | 487 | 2009 | The Blind Side | 49,746,319 | |
491 | 496 | 1997 | Liar Liar | 121,300,000 | |
505 | 511 | 2002 | Austin Powers in Goldmember | 83,220,874 | |
508 | 514 | 1984 | Ghostbusters | 52,999,344 | |
524 | 530 | 2015 | Pitch Perfect 2 | 102,918,966 | |
530 | 536 | 2005 | Wedding Crashers | 74,000,000 | |
552 | 558 | 2012 | Lincoln | 91,138,308 | |
555 | 561 | 2010 | Grown Ups | 110,221,814 | |
561 | 567 | 2021 | A Quiet Place: Part II | 109,047,080 | |
569 | 575 | 1992 | Batman Returns | 103,990,656 |
movie_data['Year'].value_counts().sort_index().plot(
figsize=(10,4),
marker='.',
title="Number of movies released each year",
xlabel="Year",
ylabel="Number of movies"
)
<AxesSubplot:title={'center':'Number of movies released each year'}, xlabel='Year', ylabel='Number of movies'>
movie_data['Year'].value_counts().sort_index().plot(
kind='bar',
figsize=(10,4),
title="Number of movies released each year",
xlabel="Year",
ylabel="Number of movies"
)
<AxesSubplot:title={'center':'Number of movies released each year'}, xlabel='Year', ylabel='Number of movies'>
def about_star_wars(name):
return "Star Wars" in name
star_wars = movie_data[ movie_data['Movie'].apply(about_star_wars)].sort_values(by='Year')
star_wars[columns_of_interest]
Rank | Year | Movie | DomesticBox Office | InternationalBox Office | |
---|---|---|---|---|---|
97 | 98 | 1977 | Star Wars Ep. IV: A New Hope | 314,400,000 | |
186 | 188 | 1980 | Star Wars Ep. V: The Empire Strikes Back | 257,262,282 | |
234 | 237 | 1983 | Star Wars Ep. VI: Return of the Jedi | 165,901,098 | |
40 | 41 | 1999 | Star Wars Ep. I: The Phantom Menace | 552,500,000 | |
134 | 136 | 2002 | Star Wars Ep. II: Attack of the Clones | 346,018,875 | |
76 | 77 | 2005 | Star Wars Ep. III: Revenge of the Sith | 468,728,300 | |
3 | 4 | 2015 | Star Wars Ep. VII: The Force Awakens | 1,127,953,592 | |
35 | 36 | 2016 | Rogue One: A Star Wars Story | 522,958,274 | |
13 | 14 | 2017 | Star Wars Ep. VIII: The Last Jedi | 711,453,759 | |
313 | 317 | 2018 | Solo: A Star Wars Story | 179,383,835 | |
31 | 32 | 2019 | Star Wars: The Rise of Skywalker | 557,645,945 |
star_wars.plot(
x='Year',
y=['Domestic','International'],
title='Star Wars Movie Revenue',
marker='.'
)
<AxesSubplot:title={'center':'Star Wars Movie Revenue'}, xlabel='Year'>