BYU logo Computer Science

Dictionaries, Part 1

  • Let’s imagine you have some census data from 1940:

1940 census

  • You get this data in comma-separated value format (CSV)
# last, first, relationship, gender, race, age, marital status, ...
Baer,William,head,M,W,51,M
Baer,Ruth,wife,F,W,38,M
Baer,Robert,son,M,W,12,S
Baer,William,son,M,W,10,S
Sposato,Carolina,head,F,W,53,Wd
Sposato,Albert,son,M,W,23,S
Sposato,Carlo,son,M,W,21,S
Sposato,Antonio,son,M,W,18,S
Sposato,Ralph,son,M,W,10,S
Sposato,Frances,daughter,F,W,28,S
Zappala,Mariano,head,M,W,27,M
Zappala,Anna,wife,F,W,25,M
  • You would like to calculate the number of people in the census who are:

    • ages 0-9
    • ages 10-19
    • ages 20-29
    • ages 30-39
    • ages 40-49
  • and so forth

  • You could create separate variables for each age range:

ages0to9 = 0
ages10to19 = 0
ages20to29 = 0
ages30to39 = 0
ages40to49 = 0
  • and then use the accumulator pattern, right?

don't do that

  • What you want instead is a dictionary:
    • maps keys to values

dictionary-example

Creating dictionaries

# create a blank dictionary
age_count = {}
age_count['age0to9'] = 0
age_count['age10to19'] = 0
age_count['age20to29'] = 0
age_count['age30to39'] = 0
age_count['age40to49'] = 0
age_count['age50to59'] = 0
print(age_count)
    {'age0to9': 0, 'age10to19': 0, 'age20to29': 0, 'age30to39': 0, 'age40to49': 0, 'age50to59': 0}
# shorter version
age_count = {'age0to9': 0, 'age10to19': 0, 'age20to29': 0, 'age30to39': 0, 'age40to49': 0, 'age50to59': 0}
print(age_count)
    {'age0to9': 0, 'age10to19': 0, 'age20to29': 0, 'age30to39': 0, 'age40to49': 0, 'age50to59': 0}

Getting and setting values

# get a value
result = age_count['age20to29']
print(result)

# set a value
age_count['age20to29'] = 5
result = age_count['age20to29']
print(result)
    0
    5
  • each value acts like any other variable
    • it can have only one value
    • if you change it, you overwrite the old value
result1 = age_count['age20to29']
age_count['age20to29'] = 6
result2 = age_count['age20to29']
print(f"count was {result1} now it is {result2}")
    count was 5 now it is 6

Example: Census Count

def census_age_count(filename):
  • We will do this in class in PyCharm
def census_age_count(filename):
    """
    Count ages in the census
    :param filename: the name of a file with 1940 census data
    :return: a dictionary with counts accumulated by age
    >>> census_age_count('census.txt')
    {'age0to9': 0, 'age10to19': 4, 'age20to29': 5, 'age30to39': 1, 'age40to49': 0, 'age50to59': 2}
    """
    age_count = {'age0to9': 0, 'age10to19': 0, 'age20to29': 0, 'age30to39': 0, 'age40to49': 0, 'age50to59': 0}
    with open(filename) as file:
        for line in file:
            last, first, relationship, gender, race, age, marital_status = line.strip().split(',')
            age = int(age)
            if age < 10:
                age_count['age0to9'] += 1
            elif age < 20:
                age_count['age10to19'] += 1
            elif age < 30:
                age_count['age20to29'] += 1
            elif age < 40:
                age_count['age30to39'] += 1
            elif age < 50:
                age_count['age40to49'] += 1
            elif age < 60:
                age_count['age50to59'] += 1
    return age_count

Checking if a key is in a dictionary

result = age_count['age90to99']
    ---------------------------------------------------------------------------

    KeyError                                  Traceback (most recent call last)

    /var/folders/9x/cb134v3d2nb22_rksynbspqm0000gn/T/ipykernel_54731/704241586.py in <module>
    ----> 1 result = age_count['age90to99']


    KeyError: 'age90to99'
result = 0
if 'age90to99' in age_count:
    result = age_count['age90to99']
print(result)
    0

Example: Bad Start

  • given a dictionary meals where the key is a ‘breakfast’, ‘lunch’, or ‘dinner’
  • the values are foods
  • bad start if you didn’t have breakfast or if you had candy for breakfast
def bad_start(meals):
def bad_start(meals):
    if 'breakfast' not in meals:
        return True
    if meals['breakfast'] == 'candy':
        return True
    return False

bad_start({'dinner': 'pizza', 'lunch': 'sandwich'})
bad_start({'dinner': 'pizza', 'breakfast': 'sandwich'})
bad_start({'dinner': 'pizza', 'breakfast': 'candy'})
    True
def bad_start(meals):
    if 'breakfast' not in meals or meals['breakfast'] == 'candy':
    # if meals['breakfast'] == 'candy' or 'breakfast' not in meals:
        return True
    return False

def bad_start2(meals):
    return 'breakfast' not in meals or meals['breakfast'] == 'candy'

bad_start({'dinner': 'pizza', 'lunch': 'sandwich'})
bad_start({'dinner': 'pizza', 'breakfast': 'sandwich'})
bad_start({'dinner': 'pizza', 'breakfast': 'candy'})
bad_start2({'dinner': 'pizza', 'breakfast': 'sandwich'})

    False

Example: Enkale

  • given a dictionary meals where the key is a ‘breakfast’, ‘lunch’, or ‘dinner’
  • the values are foods
  • if ‘dinner’ has ‘candy’ as a value, change it to kale
  • return the dictionary
def enkale(meals):
def enkale(meals):
    if 'dinner' in meals and meals['dinner'] == 'candy':
        meals['dinner'] = 'kale'
    return meals

enkale({'dinner': 'candy'})
    {'dinner': 'kale'}

Example: Is Boring

  • given a dictionary meals where the key is a ‘breakfast’, ‘lunch’, or ‘dinner’
  • the values are foods
  • if lunch and dinner are both present and are the same food, return True
def is_boring(meals):
def is_boring(meals):
    if 'lunch' in meals and 'dinner' in meals and meals['lunch'] == meals['dinner']:
        return True
    return False

def is_boring2(meals):
    return 'lunch' in meals and 'dinner' in meals and meals['lunch'] == meals['dinner']


is_boring({'dinner': 'pizza', 'lunch': 'pizza'})
is_boring2({'dinner': 'pizza', 'lunch': 'pizza'})
    True

Computing keys

  • we have been creating dictionaries like this:
meals = {}
meals['breakfast'] = 'candy'
meals['dinner'] = 'pizza'
  • or this:
meals = { 'breakfast': 'candy', 'dinner': 'pizza'}
  • but what if we want to compute the keys?
  • given a list of words, find a count of all the words starting with each letter
def count_words_by_starting_letter(words):
  • we will do this in class using PyCharm
def count_words_by_starting_letter(words):
    """
    count all the words starting with each letter
    :param words: a list of words
    :return: a dictionary that counts all the words starting with each letter
    >>> result = count_words_by_starting_letter(['rock', 'paper', 'scissors', 'stone', 'parchment'])
    >>> from pprint import pprint
    >>> pprint(result)
    >>> {'p': 2, 'r': 1, 's': 2}
    """
    starting_letters = {}
    for word in words:
        letter = word[0]
        if letter not in starting_letters:
            starting_letters[letter] = 0
        starting_letters[letter] += 1
    return starting_letters

Let’s revisit census_age_count()

  • this was our dictionary:
age_count = {'age0to9': 0, 'age10to19': 0, 'age20to29': 0, 'age30to39': 0, 'age40to49': 0, 'age50to59': 0}
  • what if we instead want to automatically calculate these?

  • we need a function that turns an age into a key:

def round_to_nearest_10(number):
  • we will write this in class using PyCharm
def round_to_nearest_10(number):
    """
    Round a number down to the nearest 10
    :param number: a number
    :return: a number rounded down to the nearest 10
    >>> round_to_nearest_10(10)
    10
    >>> round_to_nearest_10(18)
    18
    """
    remainder = number % 10
    return number - remainder
  • now we can rewrite census_age_count() to calculate our keys instead of having to pre-determine what they are

  • we will write this in class using PyCharm

def census_age_count2(filename):
    """
    Count ages in the census
    :param filename: the name of a file with 1940 census data
    :return: a dictionary with counts accumulated by age
    >>> result = census_age_count2('census.txt')
    >>> from pprint import pprint
    >>> pprint(result)
    {10: 4, 20: 5, 30: 1, 50: 2}
    """
    age_count = {}
    with open(filename) as file:
        for line in file:
            last, first, relationship, gender, race, age, marital_status = line.strip().split(',')
            age = int(age)
            # round by to nearest 10s
            age_group = round_to_nearest_10(age)
            if age_group not in age_count:
                age_count[age_group] = 0
            age_count[age_group] += 1
    return age_count