BYU logo Computer Science

Introduction to Pandas

🐼 🐼 🐼

import pandas as pd
import matplotlib.pyplot as plt

The pandas package gives us several useful tools for managing data.

! head world-population.csv
    LocID,Location,VarID,Variant,Time,MidPeriod,PopMale,PopFemale,PopTotal,PopDensity
    900,World,2,Medium,1950,1950.5,1266259.556,1270171.462,2536431.018,19.497
    900,World,2,Medium,1951,1951.5,1290237.638,1293796.589,2584034.227,19.863
    900,World,2,Medium,1952,1952.5,1313854.565,1317007.125,2630861.69,20.223
    900,World,2,Medium,1953,1953.5,1337452.786,1340156.275,2677609.061,20.582
    900,World,2,Medium,1954,1954.5,1361313.834,1363532.92,2724846.754,20.945
    900,World,2,Medium,1955,1955.5,1385658.299,1387361.616,2773019.915,21.316
    900,World,2,Medium,1956,1956.5,1410643.208,1411800.046,2822443.254,21.695
    900,World,2,Medium,1957,1957.5,1436364.62,1436941.438,2873306.058,22.086
    900,World,2,Medium,1958,1958.5,1462864.322,1462822.358,2925686.68,22.489
world_pop = pd.read_csv("world-population.csv")
world_pop
LocID Location VarID Variant Time MidPeriod PopMale PopFemale PopTotal PopDensity
0 900 World 2 Medium 1950 1950.5 1266259.556 1270171.462 2.536431e+06 19.497
1 900 World 2 Medium 1951 1951.5 1290237.638 1293796.589 2.584034e+06 19.863
2 900 World 2 Medium 1952 1952.5 1313854.565 1317007.125 2.630862e+06 20.223
3 900 World 2 Medium 1953 1953.5 1337452.786 1340156.275 2.677609e+06 20.582
4 900 World 2 Medium 1954 1954.5 1361313.834 1363532.920 2.724847e+06 20.945
146 900 World 2 Medium 2096 2096.5 5439132.293 5418979.294 1.085811e+07 83.464
147 900 World 2 Medium 2097 2097.5 5441365.752 5422249.024 1.086361e+07 83.506
148 900 World 2 Medium 2098 2098.5 5443228.989 5425118.647 1.086835e+07 83.542
149 900 World 2 Medium 2099 2099.5 5444712.816 5427571.318 1.087228e+07 83.573
150 900 World 2 Medium 2100 2100.5 5445805.463 5429588.256 1.087539e+07 83.596

151 rows × 10 columns

type(world_pop)
    pandas.core.frame.DataFrame

pd.read_csv returns a DataFrame object.

It stores the data as a dictionary of columns.

world_pop.head()
LocID Location VarID Variant Time MidPeriod PopMale PopFemale PopTotal PopDensity
0 900 World 2 Medium 1950 1950.5 1266259.556 1270171.462 2536431.018 19.497
1 900 World 2 Medium 1951 1951.5 1290237.638 1293796.589 2584034.227 19.863
2 900 World 2 Medium 1952 1952.5 1313854.565 1317007.125 2630861.690 20.223
3 900 World 2 Medium 1953 1953.5 1337452.786 1340156.275 2677609.061 20.582
4 900 World 2 Medium 1954 1954.5 1361313.834 1363532.920 2724846.754 20.945
world_pop.tail()
LocID Location VarID Variant Time MidPeriod PopMale PopFemale PopTotal PopDensity
146 900 World 2 Medium 2096 2096.5 5439132.293 5418979.294 1.085811e+07 83.464
147 900 World 2 Medium 2097 2097.5 5441365.752 5422249.024 1.086361e+07 83.506
148 900 World 2 Medium 2098 2098.5 5443228.989 5425118.647 1.086835e+07 83.542
149 900 World 2 Medium 2099 2099.5 5444712.816 5427571.318 1.087228e+07 83.573
150 900 World 2 Medium 2100 2100.5 5445805.463 5429588.256 1.087539e+07 83.596

You can get a whole column:

world_pop['PopTotal']
    0      2.536431e+06
    1      2.584034e+06
    2      2.630862e+06
    3      2.677609e+06
    4      2.724847e+06
               ...
    146    1.085811e+07
    147    1.086361e+07
    148    1.086835e+07
    149    1.087228e+07
    150    1.087539e+07
    Name: PopTotal, Length: 151, dtype: float64

You can get individual records:

world_pop.loc[1]
    LocID                 900
    Location            World
    VarID                   2
    Variant            Medium
    Time                 1951
    MidPeriod          1951.5
    PopMale       1290237.638
    PopFemale     1293796.589
    PopTotal      2584034.227
    PopDensity         19.863
    Name: 1, dtype: object

You can look at just the columns you want:

world_pop[['Time', 'PopTotal', 'PopMale', 'PopFemale']].head()
Time PopTotal PopMale PopFemale
0 1950 2536431.018 1266259.556 1270171.462
1 1951 2584034.227 1290237.638 1293796.589
2 1952 2630861.690 1313854.565 1317007.125
3 1953 2677609.061 1337452.786 1340156.275
4 1954 2724846.754 1361313.834 1363532.920

You can plot the data!

plt.plot(world_pop['Time'], world_pop['PopTotal'])
    [<matplotlib.lines.Line2D at 0x12a2c9130>]

png

world_pop.plot(x="Time", y="PopTotal")
    <AxesSubplot:xlabel='Time'>

png

world_pop.plot(x="Time", y=["PopMale","PopFemale","PopTotal"])
    <AxesSubplot:xlabel='Time'>

png

You can create new columns:

world_pop["PopTotalB"] = world_pop["PopTotal"] / 1e6
world_pop.head()
LocID Location VarID Variant Time MidPeriod PopMale PopFemale PopTotal PopDensity PopTotalB
0 900 World 2 Medium 1950 1950.5 1266259.556 1270171.462 2536431.018 19.497 2.536431
1 900 World 2 Medium 1951 1951.5 1290237.638 1293796.589 2584034.227 19.863 2.584034
2 900 World 2 Medium 1952 1952.5 1313854.565 1317007.125 2630861.690 20.223 2.630862
3 900 World 2 Medium 1953 1953.5 1337452.786 1340156.275 2677609.061 20.582 2.677609
4 900 World 2 Medium 1954 1954.5 1361313.834 1363532.920 2724846.754 20.945 2.724847
world_pop.plot(
    x='Time',
    y="PopTotalB",
    ylabel="Population (Billions)",
    title="Predicted Population for World"
)
    <AxesSubplot:title={'center':'Predicted Population for World'}, xlabel='Time', ylabel='Population (Billions)'>

png

🌎 🌍 🌏

world_pop = pd.read_csv("WPP2019_TotalPopulationBySex.csv")
world_pop.head()
LocID Location VarID Variant Time MidPeriod PopMale PopFemale PopTotal PopDensity
0 4 Afghanistan 2 Medium 1950 1950.5 4099.243 3652.874 7752.117 11.874
1 4 Afghanistan 2 Medium 1951 1951.5 4134.756 3705.395 7840.151 12.009
2 4 Afghanistan 2 Medium 1952 1952.5 4174.450 3761.546 7935.996 12.156
3 4 Afghanistan 2 Medium 1953 1953.5 4218.336 3821.348 8039.684 12.315
4 4 Afghanistan 2 Medium 1954 1954.5 4266.484 3884.832 8151.316 12.486
world_pop["Location"].unique()
    array(['Afghanistan', 'Africa', 'African Group', 'African Union',
           'African Union: Central Africa', 'African Union: Eastern Africa',
           'African Union: Northern Africa', 'African Union: Southern Africa',
           'African Union: Western Africa',
           'African, Caribbean and Pacific (ACP) Group of States', 'Albania',
           'Algeria', 'American Samoa', 'Andean Community', 'Andorra',
           'Angola', 'Anguilla', 'Antigua and Barbuda', 'Argentina',
           'Armenia', 'Aruba', 'Asia',
           'Asia-Pacific Economic Cooperation (APEC)', 'Asia-Pacific Group',
           'Association of Southeast Asian Nations (ASEAN)', 'Australia',
           'Australia/New Zealand', 'Austria', 'Azerbaijan', 'BRIC', 'BRICS',
           'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus',
           'Belgium', 'Belize', 'Belt-Road Initiative (BRI)',
           'Belt-Road Initiative: Africa', 'Belt-Road Initiative: Asia',
           'Belt-Road Initiative: Europe',
           'Belt-Road Initiative: Latin America and the Caribbean',
           'Belt-Road Initiative: Pacific', 'Benin', 'Bermuda', 'Bhutan',
           'Black Sea Economic Cooperation (BSEC)',
           'Bolivarian Alliance for the Americas (ALBA)',
           'Bolivia (Plurinational State of)',
           'Bonaire, Sint Eustatius and Saba', 'Bosnia and Herzegovina',
           'Botswana', 'Brazil', 'British Virgin Islands',
           'Brunei Darussalam', 'Bulgaria', 'Burkina Faso', 'Burundi',
           'Cabo Verde', 'Cambodia', 'Cameroon', 'Canada', 'Caribbean',
           'Caribbean Community and Common Market (CARICOM)',
           'Cayman Islands', 'Central African Republic', 'Central America',
           'Central Asia', 'Central European Free Trade Agreement (CEFTA)',
           'Central and Southern Asia', 'Chad', 'Channel Islands', 'Chile',
           'China', 'China (and dependencies)', 'China, Hong Kong SAR',
           'China, Macao SAR', 'China, Taiwan Province of China', 'Colombia',
           'Commonwealth of Independent States (CIS)',
           'Commonwealth of Nations', 'Commonwealth: Africa',
           'Commonwealth: Asia', 'Commonwealth: Caribbean and Americas',
           'Commonwealth: Europe', 'Commonwealth: Pacific', 'Comoros',
           'Congo', 'Cook Islands', 'Costa Rica',
           'Countries with Access to the Sea',
           'Countries with Access to the Sea: Africa',
           'Countries with Access to the Sea: Asia',
           'Countries with Access to the Sea: Europe',
           'Countries with Access to the Sea: Latin America and the Caribbean',
           'Countries with Access to the Sea: Northern America',
           'Countries with Access to the Sea: Oceania', 'Croatia', 'Cuba',
           'Curaçao', 'Cyprus', 'Czechia', "Côte d'Ivoire",
           "Dem. People's Republic of Korea",
           'Democratic Republic of the Congo', 'Denmark',
           'Denmark (and dependencies)', 'Djibouti', 'Dominica',
           'Dominican Republic', 'ECE: North America-2', 'ECE: UNECE-52',
           'ECLAC: Latin America', 'ECLAC: The Caribbean',
           'ESCAP region: East and North-East Asia',
           'ESCAP region: North and Central Asia', 'ESCAP region: Pacific',
           'ESCAP region: South and South-West Asia',
           'ESCAP region: South-East Asia',
           'ESCAP: ADB Developing member countries (DMCs)',
           'ESCAP: ADB Group A (Concessional assistance\xa0only)',
           'ESCAP: ADB Group B\xa0(OCR blend)',
           'ESCAP: ADB Group C (Regular OCR only)', 'ESCAP: ASEAN',
           'ESCAP: Central Asia', 'ESCAP: ECO', 'ESCAP: HDI groups',
           'ESCAP: Landlocked countries (LLDCs)',
           'ESCAP: Least Developed Countries (LDCs)',
           'ESCAP: Pacific island dev. econ.', 'ESCAP: SAARC',
           'ESCAP: WB High income econ.', 'ESCAP: WB Low income econ.',
           'ESCAP: WB Lower middle income econ.',
           'ESCAP: WB Upper middle income econ.', 'ESCAP: WB income groups',
           'ESCAP: high HDI', 'ESCAP: high income', 'ESCAP: income groups',
           'ESCAP: low HDI', 'ESCAP: low income', 'ESCAP: lower middle HDI',
           'ESCAP: lower middle income',
           'ESCAP: other Asia-Pacific countries/areas',
           'ESCAP: upper middle HDI', 'ESCAP: upper middle income',
           'ESCWA: Arab countries', 'ESCWA: Arab least developed countries',
           'ESCWA: Gulf Cooperation Council countries',
           'ESCWA: Maghreb countries', 'ESCWA: Mashreq countries',
           'ESCWA: member countries', 'East African Community (EAC)',
           'Eastern Africa', 'Eastern Asia', 'Eastern Europe',
           'Eastern European Group', 'Eastern and South-Eastern Asia',
           'Economic Community of Central African States (ECCAS)',
           'Economic Community of West African States (ECOWAS)',
           'Economic Cooperation Organization (ECO)', 'Ecuador', 'Egypt',
           'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia',
           'Eswatini', 'Ethiopia', 'Eurasian Economic Community (Eurasec)',
           'Europe', 'Europe (48)', 'Europe and Northern America',
           'European Community (EC: 12)',
           'European Free Trade Agreement (EFTA)', 'European Union (EU: 15)',
           'European Union (EU: 28)', 'Falkland Islands (Malvinas)',
           'Faroe Islands', 'Fiji', 'Finland', 'France',
           'France (and dependencies)', 'French Guiana', 'French Polynesia',
           'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar',
           'Greater Arab Free Trade Area (GAFTA)', 'Greece', 'Greenland',
           'Grenada', 'Group of 77 (G77)', 'Group of Eight (G8)',
           'Group of Seven (G7)', 'Group of Twenty (G20) - member states',
           'Guadeloupe', 'Guam', 'Guatemala', 'Guinea', 'Guinea-Bissau',
           'Gulf Cooperation Council (GCC)', 'Guyana', 'Haiti',
           'High-income countries', 'Holy See', 'Honduras', 'Hungary',
           'Iceland', 'India', 'Indonesia', 'Iran (Islamic Republic of)',
           'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica',
           'Japan', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Kuwait',
           'Kyrgyzstan', 'LLDC: Africa', 'LLDC: Asia', 'LLDC: Europe',
           'LLDC: Latin America', 'Land-locked Countries',
           'Land-locked Countries (Others)',
           'Land-locked Developing Countries (LLDC)',
           "Lao People's Democratic Republic",
           'Latin America and the Caribbean',
           'Latin American Integration Association (ALADI)',
           'Latin American and Caribbean Group (GRULAC)', 'Latvia',
           'League of Arab States (LAS, informal name: Arab League)',
           'Least developed countries', 'Least developed: Africa',
           'Least developed: Asia',
           'Least developed: Latin America and the Caribbean',
           'Least developed: Oceania', 'Lebanon', 'Lesotho',
           'Less developed regions',
           'Less developed regions, excluding China',
           'Less developed regions, excluding least developed countries',
           'Less developed: Africa', 'Less developed: Asia',
           'Less developed: Latin America and the Caribbean',
           'Less developed: Oceania', 'Liberia', 'Libya', 'Liechtenstein',
           'Lithuania', 'Low-income countries',
           'Lower-middle-income countries', 'Luxembourg', 'Madagascar',
           'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta',
           'Marshall Islands', 'Martinique', 'Mauritania', 'Mauritius',
           'Mayotte', 'Melanesia', 'Mexico', 'Micronesia',
           'Micronesia (Fed. States of)', 'Middle Africa',
           'Middle-income countries', 'Monaco', 'Mongolia', 'Montenegro',
           'Montserrat', 'More developed regions', 'More developed: Asia',
           'More developed: Europe', 'More developed: Northern America',
           'More developed: Oceania', 'Morocco', 'Mozambique', 'Myanmar',
           'Namibia', 'Nauru', 'Nepal', 'Netherlands',
           'Netherlands (and dependencies)', 'New Caledonia',
           'New EU member states (joined since 2004)', 'New Zealand',
           'New Zealand (and dependencies)', 'Nicaragua', 'Niger', 'Nigeria',
           'Niue', 'No income group available',
           'Non-Self-Governing Territories',
           'North American Free Trade Agreement (NAFTA)',
           'North Atlantic Treaty Organization (NATO)', 'North Macedonia',
           'Northern Africa', 'Northern Africa and Western Asia',
           'Northern America', 'Northern Europe', 'Northern Mariana Islands',
           'Norway', 'Oceania',
           'Oceania (excluding Australia and New Zealand)', 'Oman',
           'Organisation for Economic Co-operation and Development (OECD)',
           'Organization for Security and Co-operation in Europe (OSCE)',
           'Organization of American States (OAS)',
           'Organization of Petroleum Exporting countries (OPEC)',
           'Organization of the Islamic Conference (OIC)', 'Pakistan',
           'Palau', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru',
           'Philippines', 'Poland', 'Polynesia', 'Portugal', 'Puerto Rico',
           'Qatar', 'Republic of Korea', 'Republic of Moldova', 'Romania',
           'Russian Federation', 'Rwanda', 'Réunion',
           'SIDS Atlantic, and Indian Ocean, Mediterranean and South China Sea (AIMS)',
           'SIDS Caribbean', 'SIDS Pacific', 'Saint Barthélemy',
           'Saint Helena', 'Saint Kitts and Nevis', 'Saint Lucia',
           'Saint Martin (French part)', 'Saint Pierre and Miquelon',
           'Saint Vincent and the Grenadines', 'Samoa', 'San Marino',
           'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia',
           'Seychelles', 'Shanghai Cooperation Organization (SCO)',
           'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)',
           'Slovakia', 'Slovenia', 'Small Island Developing States (SIDS)',
           'Solomon Islands', 'Somalia', 'South Africa', 'South America',
           'South Asian Association for Regional Cooperation (SAARC)',
           'South Sudan', 'South-Eastern Asia', 'Southern Africa',
           'Southern African Development Community (SADC)', 'Southern Asia',
           'Southern Common Market (MERCOSUR)', 'Southern Europe', 'Spain',
           'Sri Lanka', 'State of Palestine', 'Sub-Saharan Africa', 'Sudan',
           'Suriname', 'Sweden', 'Switzerland', 'Syrian Arab Republic',
           'Tajikistan', 'Thailand', 'Timor-Leste', 'Togo', 'Tokelau',
           'Tonga', 'Trinidad and Tobago', 'Tunisia', 'Turkey',
           'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu',
           'UN-ECE: member countries', 'UNFPA Regions',
           'UNFPA: Arab States (AS)', 'UNFPA: Asia and the Pacific (AP)',
           'UNFPA: East and Southern Africa (ESA)',
           'UNFPA: Eastern Europe and Central Asia (EECA)',
           'UNFPA: Latin America and the Caribbean (LAC)',
           'UNFPA: West and Central Africa (WCA)', 'UNICEF PROGRAMME REGIONS',
           'UNICEF Programme Regions: East Asia and Pacific (EAPRO)',
           'UNICEF Programme Regions: Eastern Caribbean',
           'UNICEF Programme Regions: Eastern and Southern Africa (ESARO)',
           'UNICEF Programme Regions: Europe and Central Asia (CEECIS)',
           'UNICEF Programme Regions: Latin America',
           'UNICEF Programme Regions: Latin America and Caribbean (LACRO)',
           'UNICEF Programme Regions: Middle East and North Africa (MENARO)',
           'UNICEF Programme Regions: South Asia (ROSA)',
           'UNICEF Programme Regions: West and Central Africa (WCARO)',
           'UNICEF REGIONS', 'UNICEF Regions: East Asia and Pacific',
           'UNICEF Regions: Eastern Europe and Central Asia',
           'UNICEF Regions: Eastern and Southern Africa',
           'UNICEF Regions: Europe and Central Asia',
           'UNICEF Regions: Latin America and Caribbean',
           'UNICEF Regions: Middle East and North Africa',
           'UNICEF Regions: North America', 'UNICEF Regions: South Asia',
           'UNICEF Regions: Sub-Saharan Africa',
           'UNICEF Regions: West and Central Africa',
           'UNICEF Regions: Western Europe',
           'UNITED NATIONS Regional Groups of Member States', 'Uganda',
           'Ukraine', 'United Arab Emirates', 'United Kingdom',
           'United Kingdom (and dependencies)',
           'United Nations Economic Commission for Africa (UN-ECA)',
           'United Nations Economic Commission for Latin America and the Caribbean (UN-ECLAC)',
           'United Nations Economic and Social Commission for Asia and the Pacific (UN-ESCAP) Regions',
           'United Nations Member States', 'United Republic of Tanzania',
           'United States Virgin Islands', 'United States of America',
           'United States of America (and dependencies)',
           'Upper-middle-income countries', 'Uruguay', 'Uzbekistan',
           'Vanuatu', 'Venezuela (Bolivarian Republic of)', 'Viet Nam',
           'WB region: East Asia and Pacific (excluding high income)',
           'WB region: Europe and Central Asia (excluding high income)',
           'WB region: Latin America and Caribbean (excluding high income)',
           'WB region: Middle East and North Africa (excluding high income)',
           'WB region: South Asia (excluding high income)',
           'WB region: Sub-Saharan Africa (excluding high income)',
           'WHO Regions', 'WHO: African region (AFRO)',
           'WHO: Americas (AMRO)', 'WHO: Eastern Mediterranean Region (EMRO)',
           'WHO: European Region (EURO)',
           'WHO: South-East Asia region (SEARO)',
           'WHO: Western Pacific region (WPRO)', 'Wallis and Futuna Islands',
           'West African Economic and Monetary Union (UEMOA)',
           'Western Africa', 'Western Asia', 'Western Europe',
           'Western European and Others Group (WEOG)', 'Western Sahara',
           'World', 'World Bank Regional Groups (developing only)', 'Yemen',
           'Zambia', 'Zimbabwe'], dtype=object)
place = "Costa Rica"
place_pop = world_pop[ world_pop["Location"].eq(place) & world_pop["Variant"].eq("Medium") ]
place_pop.plot(x="Time", y="PopTotal", title=place)
    <AxesSubplot:title={'center':'Costa Rica'}, xlabel='Time'>

png

world_pop["Location"].eq(place).head()
    0    False
    1    False
    2    False
    3    False
    4    False
    Name: Location, dtype: bool
world_pop["Location"].eq(place).sum()
    884
import random
places = random.sample(list(world_pop["Location"].unique()), 5)
for place in places:
    place_pop = world_pop[ world_pop["Location"].eq(place) & world_pop["Variant"].eq("Medium") ]
    place_pop.plot(x="Time", y="PopTotal", title=place)

png

png

png

png

png

🎥

movie_data = pd.read_csv("all-time-worldwide-box-office.csv")
movie_data.head()
Rank Year Movie WorldwideBox Office DomesticBox Office InternationalBox Office
0 1 2009 Avatar 2,845,899,541</td><td>2,845,899,541</td> <td>760,507,625 $2,085,391,916
1 2 2019 Avengers: Endgame 2,797,800,564</td><td>2,797,800,564</td> <td>858,373,000 $1,939,427,564
2 3 1997 Titanic 2,207,986,545</td><td>2,207,986,545</td> <td>659,363,944 $1,548,622,601
3 4 2015 Star Wars Ep. VII: The Force Awakens 2,064,615,817</td><td>2,064,615,817</td> <td>936,662,225 $1,127,953,592
4 5 2018 Avengers: Infinity War 2,044,540,523</td><td>2,044,540,523</td> <td>678,815,482 $1,365,725,041
def clean_money(entry):
    if isinstance(entry, float):
        return 0

    return int(entry.strip('$').replace(',', ''))
movie_data['Worldwide'] = movie_data['WorldwideBox Office'].apply(clean_money)
movie_data['Domestic'] = movie_data['DomesticBox Office'].apply(clean_money)
movie_data['International'] = movie_data['InternationalBox Office'].apply(clean_money)
movie_data['Worldwide'].describe()
    count    5.950000e+02
    mean     5.249021e+08
    std      3.130981e+08
    min      2.578052e+08
    25%      3.220422e+08
    50%      4.110029e+08
    75%      6.308023e+08
    max      2.845900e+09
    Name: Worldwide, dtype: float64
movie_data[ movie_data['Year'].eq(2019) ]
Rank Year Movie WorldwideBox Office DomesticBox Office InternationalBox Office Worldwide Domestic International
1 2 2019 Avengers: Endgame 2,797,800,564</td><td>2,797,800,564</td> <td>858,373,000 $1,939,427,564 2797800564 858373000 1939427564
6 7 2019 The Lion King 1,654,367,425</td><td>1,654,367,425</td> <td>543,638,043 $1,110,729,382 1654367425 543638043 1110729382
9 10 2019 Frozen II 1,446,925,396</td><td>1,446,925,396</td> <td>477,373,578 $969,551,818 1446925396 477373578 969551818
23 24 2019 Spider-Man: Far From Home 1,131,113,066</td><td>1,131,113,066</td> <td>390,532,085 $740,580,981 1131113066 390532085 740580981
24 25 2019 Captain Marvel 1,129,727,388</td><td>1,129,727,388</td> <td>426,829,839 $702,897,549 1129727388 426829839 702897549
30 31 2019 Toy Story 4 1,073,080,329</td><td>1,073,080,329</td> <td>434,038,008 $639,042,321 1073080329 434038008 639042321
31 32 2019 Star Wars: The Rise of Skywalker 1,072,848,487</td><td>1,072,848,487</td> <td>515,202,542 $557,645,945 1072848487 515202542 557645945
32 33 2019 Joker 1,072,507,517</td><td>1,072,507,517</td> <td>335,451,311 $737,056,206 1072507517 335451311 737056206
36 37 2019 Aladdin 1,046,649,706</td><td>1,046,649,706</td> <td>355,559,216 $691,090,490 1046649706 355559216 691090490
86 87 2019 Jumanji: The Next Level 800,128,637</td><td>800,128,637</td> <td>316,831,246 $483,297,391 800128637 316831246 483297391
100 102 2019 Fast & Furious Presents: Hobbs & Shaw 760,381,706</td><td>760,381,706</td> <td>173,956,935 $586,424,771 760381706 173956935 586424771
109 111 2019 Ne Zha Zhi Mo Tong Jiang Shi 742,711,251</td><td>742,711,251</td> <td>3,695,533 $739,015,718 742711251 3695533 739015718
113 115 2019 Liu Lang Di Qiu 727,846,450</td><td>727,846,450</td> <td>5,875,487 $721,970,963 727846450 5875487 721970963
203 206 2019 How to Train Your Dragon: The Hidden … 522,746,363</td><td>522,746,363</td> <td>160,799,505 $361,946,858 522746363 160799505 361946858
223 226 2019 Maleficent: Mistress of Evil 489,346,495</td><td>489,346,495</td> <td>113,929,605 $375,416,890 489346495 113929605 375416890
239 242 2019 It: Chapter Two 469,566,806</td><td>469,566,806</td> <td>211,593,228 $257,973,578 469566806 211593228 257973578
245 248 2019 Wo he wo de zu guo 465,418,019</td><td>465,418,019</td> <td>2,323,409 $463,094,610 465418019 2323409 463094610
263 266 2019 The Secret Life of Pets 2 446,273,015</td><td>446,273,015</td> <td>158,874,395 $287,398,620 446273015 158874395 287398620
278 281 2019 Pokémon: Detective Pikachu 431,947,558</td><td>431,947,558</td> <td>144,105,346 $287,842,212 431947558 144105346 287842212
289 292 2019 Zhongguo jizhang 417,850,370</td><td>417,850,370</td> <td>706,572 $417,143,798 417850370 706572 417143798
308 312 2019 Alita: Battle Angel 401,900,040</td><td>401,900,040</td> <td>85,838,210 $316,061,830 401900040 85838210 316061830
328 332 2019 Godzilla: King of the Monsters 383,299,915</td><td>383,299,915</td> <td>110,500,138 $272,799,777 383299915 110500138 272799777
334 338 2019 Once Upon a Time…in Hollywood 377,562,039</td><td>377,562,039</td> <td>142,502,728 $235,059,311 377562039 142502728 235059311
354 358 2019 1917 366,614,721</td><td>366,614,721</td> <td>159,227,644 $207,387,077 366614721 159227644 207387077
361 365 2019 Shazam! 363,664,533</td><td>363,664,533</td> <td>140,371,656 $223,292,877 363664533 140371656 223292877
387 391 2019 Dumbo 353,166,307</td><td>353,166,307</td> <td>114,766,307 $238,400,000 353166307 114766307 238400000
440 445 2019 Feng Kuang De Wai Xing Ren 326,150,303</td><td>NaN</td><td>326,150,303</td> <td>NaN</td> <td>326,150,303 326150303 0 326150303
442 447 2019 John Wick: Chapter 3 — Parabellum 325,674,995</td><td>325,674,995</td> <td>171,016,727 $154,658,268 325674995 171016727 154658268
462 467 2019 Knives Out 312,952,178</td><td>312,952,178</td> <td>165,364,060 $147,588,118 312952178 165364060 147588118
columns_of_interest = ["Rank","Year","Movie","DomesticBox Office","InternationalBox Office"]
movie_data[ movie_data['Year'].eq(2019) ][columns_of_interest]
Rank Year Movie DomesticBox Office InternationalBox Office
1 2 2019 Avengers: Endgame 858,373,000</td><td>858,373,000</td> <td>1,939,427,564
6 7 2019 The Lion King 543,638,043</td><td>543,638,043</td> <td>1,110,729,382
9 10 2019 Frozen II 477,373,578</td><td>477,373,578</td> <td>969,551,818
23 24 2019 Spider-Man: Far From Home 390,532,085</td><td>390,532,085</td> <td>740,580,981
24 25 2019 Captain Marvel 426,829,839</td><td>426,829,839</td> <td>702,897,549
30 31 2019 Toy Story 4 434,038,008</td><td>434,038,008</td> <td>639,042,321
31 32 2019 Star Wars: The Rise of Skywalker 515,202,542</td><td>515,202,542</td> <td>557,645,945
32 33 2019 Joker 335,451,311</td><td>335,451,311</td> <td>737,056,206
36 37 2019 Aladdin 355,559,216</td><td>355,559,216</td> <td>691,090,490
86 87 2019 Jumanji: The Next Level 316,831,246</td><td>316,831,246</td> <td>483,297,391
100 102 2019 Fast & Furious Presents: Hobbs & Shaw 173,956,935</td><td>173,956,935</td> <td>586,424,771
109 111 2019 Ne Zha Zhi Mo Tong Jiang Shi 3,695,533</td><td>3,695,533</td> <td>739,015,718
113 115 2019 Liu Lang Di Qiu 5,875,487</td><td>5,875,487</td> <td>721,970,963
203 206 2019 How to Train Your Dragon: The Hidden … 160,799,505</td><td>160,799,505</td> <td>361,946,858
223 226 2019 Maleficent: Mistress of Evil 113,929,605</td><td>113,929,605</td> <td>375,416,890
239 242 2019 It: Chapter Two 211,593,228</td><td>211,593,228</td> <td>257,973,578
245 248 2019 Wo he wo de zu guo 2,323,409</td><td>2,323,409</td> <td>463,094,610
263 266 2019 The Secret Life of Pets 2 158,874,395</td><td>158,874,395</td> <td>287,398,620
278 281 2019 Pokémon: Detective Pikachu 144,105,346</td><td>144,105,346</td> <td>287,842,212
289 292 2019 Zhongguo jizhang 706,572</td><td>706,572</td> <td>417,143,798
308 312 2019 Alita: Battle Angel 85,838,210</td><td>85,838,210</td> <td>316,061,830
328 332 2019 Godzilla: King of the Monsters 110,500,138</td><td>110,500,138</td> <td>272,799,777
334 338 2019 Once Upon a Time…in Hollywood 142,502,728</td><td>142,502,728</td> <td>235,059,311
354 358 2019 1917 159,227,644</td><td>159,227,644</td> <td>207,387,077
361 365 2019 Shazam! 140,371,656</td><td>140,371,656</td> <td>223,292,877
387 391 2019 Dumbo 114,766,307</td><td>114,766,307</td> <td>238,400,000
440 445 2019 Feng Kuang De Wai Xing Ren NaN $326,150,303
442 447 2019 John Wick: Chapter 3 — Parabellum 171,016,727</td><td>171,016,727</td> <td>154,658,268
462 467 2019 Knives Out 165,364,060</td><td>165,364,060</td> <td>147,588,118
internationals = movie_data[ movie_data["International"] / movie_data["Domestic"] > 5 ]
internationals[columns_of_interest]
Rank Year Movie DomesticBox Office InternationalBox Office
77 78 2021 Ni Hao, Li Huan Ying NaN $841,716,512
79 80 2017 Zhan Lang 2 2,721,100</td><td>2,721,100</td> <td>832,753,071
109 111 2019 Ne Zha Zhi Mo Tong Jiang Shi 3,695,533</td><td>3,695,533</td> <td>739,015,718
113 115 2019 Liu Lang Di Qiu 5,875,487</td><td>5,875,487</td> <td>721,970,963
121 123 2021 Tang Ren Jie Tan An 3 NaN $698,994,069
188 190 2018 Tang Ren Jie Tan An 2 1,983,984</td><td>1,983,984</td> <td>542,084,590
193 195 2018 Hong Hai Xing Dong 1,543,547</td><td>1,543,547</td> <td>532,057,988
202 205 2016 Mei Ren Yu 3,229,457</td><td>3,229,457</td> <td>521,789,022
228 231 2012 Les Intouchables 13,182,281</td><td>13,182,281</td> <td>471,448,627
231 234 2020 Kimetsu no Yaiba: Mugen Ressha-Hen 47,700,000</td><td>47,700,000</td> <td>430,226,595
236 239 2020 Ba bai NaN $472,643,830
245 248 2019 Wo he wo de zu guo 2,323,409</td><td>2,323,409</td> <td>463,094,610
257 260 2018 Wo Bi Shi Yao Shen 6,752</td><td>6,752</td> <td>451,176,640
269 272 2016 Warcraft 47,365,290</td><td>47,365,290</td> <td>391,534,534
274 277 2020 Wo He Wo De Jia Xiang NaN $433,241,288
289 292 2019 Zhongguo jizhang 706,572</td><td>706,572</td> <td>417,143,798
307 311 2016 Ice Age: Collision Course 64,063,008</td><td>64,063,008</td> <td>338,093,674
326 330 2001 Sen to Chihiro no Kamikakushi 10,049,886</td><td>10,049,886</td> <td>373,636,205
351 355 2018 Xi Hong Shi Shou Fu NaN $366,961,920
368 372 2020 Tenet 57,929,000</td><td>57,929,000</td> <td>303,946,852
370 374 2018 Zhuo yao ji 2 706,153</td><td>706,153</td> <td>360,977,662
372 376 2015 Zhuo yao ji 32,766</td><td>32,766</td> <td>359,262,825
377 381 2016 Kimi no na wa 5,000,181</td><td>5,000,181</td> <td>352,065,557
408 413 2017 xXx: Return of Xander Cage 44,898,413</td><td>44,898,413</td> <td>300,134,946
422 427 2017 Never Say Die NaN $334,536,622
423 428 2016 The Great Wall 45,157,105</td><td>45,157,105</td> <td>289,329,747
440 445 2019 Feng Kuang De Wai Xing Ren NaN $326,150,303
460 465 2016 Resident Evil: The Final Chapter 26,844,692</td><td>26,844,692</td> <td>287,256,498
477 482 2017 Qian Ren 3: Yan Zhi Da Zuo Zhan 837,791</td><td>837,791</td> <td>306,710,033
485 490 2016 Dangal 12,391,761</td><td>12,391,761</td> <td>292,757,700
577 583 2004 Bridget Jones: The Edge Of Reason 40,203,020</td><td>40,203,020</td> <td>223,691,531
580 586 2017 Baahubali 2: The Conclusion 18,985,794</td><td>18,985,794</td> <td>244,517,120
domestics = movie_data[ movie_data["International"] / movie_data["Domestic"] < 0.7 ]
domestics[columns_of_interest]
Rank Year Movie DomesticBox Office InternationalBox Office
97 98 1977 Star Wars Ep. IV: A New Hope 460,998,007</td><td>460,998,007</td> <td>314,400,000
128 130 2012 The Hunger Games 408,010,692</td><td>408,010,692</td> <td>269,912,687
153 155 2004 The Passion of the Christ 370,782,930</td><td>370,782,930</td> <td>251,530,705
187 189 2014 American Sniper 350,126,372</td><td>350,126,372</td> <td>197,200,000
232 235 1990 Home Alone 285,761,243</td><td>285,761,243</td> <td>190,923,432
234 237 1983 Star Wars Ep. VI: Return of the Jedi 309,205,079</td><td>309,205,079</td> <td>165,901,098
244 247 2009 The Hangover 277,322,503</td><td>277,322,503</td> <td>188,165,080
296 299 1989 Batman 251,188,924</td><td>251,188,924</td> <td>160,160,000
319 323 2009 Star Trek 257,730,019</td><td>257,730,019</td> <td>129,109,595
338 342 2002 My Big Fat Greek Wedding 241,438,208</td><td>241,438,208</td> <td>133,451,826
347 351 1981 Raiders of the Lost Ark 225,686,079</td><td>225,686,079</td> <td>141,766,000
365 369 2007 Alvin and the Chipmunks 217,326,974</td><td>217,326,974</td> <td>145,278,059
396 400 2012 Dr. Seuss’ The Lorax 214,030,500</td><td>214,030,500</td> <td>136,946,253
404 409 2001 Rush Hour 2 226,164,286</td><td>226,164,286</td> <td>121,261,546
407 412 2000 How the Grinch Stole Christmas 260,348,825</td><td>260,348,825</td> <td>85,096,578
455 460 1984 Beverly Hills Cop 234,760,478</td><td>234,760,478</td> <td>81,539,522
464 469 2004 The Polar Express 188,578,855</td><td>188,578,855</td> <td>124,140,582
465 470 1999 Austin Powers: The Spy Who Shagged Me 206,040,085</td><td>206,040,085</td> <td>106,343,402
482 487 2009 The Blind Side 255,959,475</td><td>255,959,475</td> <td>49,746,319
491 496 1997 Liar Liar 181,410,615</td><td>181,410,615</td> <td>121,300,000
505 511 2002 Austin Powers in Goldmember 213,117,789</td><td>213,117,789</td> <td>83,220,874
508 514 1984 Ghostbusters 242,212,467</td><td>242,212,467</td> <td>52,999,344
524 530 2015 Pitch Perfect 2 184,296,230</td><td>184,296,230</td> <td>102,918,966
530 536 2005 Wedding Crashers 209,218,368</td><td>209,218,368</td> <td>74,000,000
552 558 2012 Lincoln 182,207,973</td><td>182,207,973</td> <td>91,138,308
555 561 2010 Grown Ups 162,001,186</td><td>162,001,186</td> <td>110,221,814
561 567 2021 A Quiet Place: Part II 159,578,799</td><td>159,578,799</td> <td>109,047,080
569 575 1992 Batman Returns 162,833,635</td><td>162,833,635</td> <td>103,990,656
movie_data['Year'].value_counts().sort_index().plot(
    figsize=(10,4),
    marker='.',
    title="Number of movies released each year",
    xlabel="Year",
    ylabel="Number of movies"
)
    <AxesSubplot:title={'center':'Number of movies released each year'}, xlabel='Year', ylabel='Number of movies'>

png

movie_data['Year'].value_counts().sort_index().plot(
    kind='bar',
    figsize=(10,4),
    title="Number of movies released each year",
    xlabel="Year",
    ylabel="Number of movies"
)
    <AxesSubplot:title={'center':'Number of movies released each year'}, xlabel='Year', ylabel='Number of movies'>

png

def about_star_wars(name):
    return "Star Wars" in name
star_wars = movie_data[ movie_data['Movie'].apply(about_star_wars)].sort_values(by='Year')
star_wars[columns_of_interest]
Rank Year Movie DomesticBox Office InternationalBox Office
97 98 1977 Star Wars Ep. IV: A New Hope 460,998,007</td><td>460,998,007</td> <td>314,400,000
186 188 1980 Star Wars Ep. V: The Empire Strikes Back 291,738,960</td><td>291,738,960</td> <td>257,262,282
234 237 1983 Star Wars Ep. VI: Return of the Jedi 309,205,079</td><td>309,205,079</td> <td>165,901,098
40 41 1999 Star Wars Ep. I: The Phantom Menace 474,544,677</td><td>474,544,677</td> <td>552,500,000
134 136 2002 Star Wars Ep. II: Attack of the Clones 310,676,740</td><td>310,676,740</td> <td>346,018,875
76 77 2005 Star Wars Ep. III: Revenge of the Sith 380,270,577</td><td>380,270,577</td> <td>468,728,300
3 4 2015 Star Wars Ep. VII: The Force Awakens 936,662,225</td><td>936,662,225</td> <td>1,127,953,592
35 36 2016 Rogue One: A Star Wars Story 532,177,324</td><td>532,177,324</td> <td>522,958,274
13 14 2017 Star Wars Ep. VIII: The Last Jedi 620,181,382</td><td>620,181,382</td> <td>711,453,759
313 317 2018 Solo: A Star Wars Story 213,767,512</td><td>213,767,512</td> <td>179,383,835
31 32 2019 Star Wars: The Rise of Skywalker 515,202,542</td><td>515,202,542</td> <td>557,645,945
star_wars.plot(
    x='Year',
    y=['Domestic','International'],
    title='Star Wars Movie Revenue',
    marker='.'
)
    <AxesSubplot:title={'center':'Star Wars Movie Revenue'}, xlabel='Year'>

png