Dictionaries, Part 2

Remember letter count

def letter_count(word):
    # create an empty dictionary
    result = {}
    for letter in word:
        # if this letter is not there, initialize a new dictionary entry
        if letter not in result:
            result[letter] = 0
        # now we can be sure the entry is there, so increment it
        result[letter] += 1
    return result

letter_count('supply')

    {'s': 1, 'u': 1, 'p': 2, 'l': 1, 'y': 1}

Important pattern:

create an empty dictionary
loop through all the keys you want to create
- if a key is not in the dictionary, initialize a new entry
- increment the value for this key

Let’s revisit census counting

create an empty dictionary
loop through all the lines in the file
- split and unpack each line
- convert age to an integer
- round age to nearest 10
- if an age range is not in the dictionary, initialize a new entry for this key
- increment the number of people in that age range

def round_to_nearest_10(number):
    remainder = number % 10
    return number - remainder

def census_age_count(filename):
    age_count = {}
    with open(filename) as file:
        for line in file:
            last, first, relationship, gender, race, age, marital_status = line.strip().split(',')
            age = int(age)
            age_group = round_to_nearest_10(age)
            if age_group not in age_count:
                age_count[age_group] = 0
            age_count[age_group] += 1
    return age_count

census_age_count('census.txt')

    {50: 2, 30: 1, 10: 4, 20: 5}

Dictionaries can store anything for values

typically integers or characters for keys
but values can be anything

Example — parsing email addresses

we have a list of email addresses:

['[email protected]', '[email protected]', '[email protected]']

build a dictionary that lists all the users with the same email provider

{'gmail.com': ['abby', 'rachel']
 'yahoo.com': ['kumar']
}

keep in mind the types
- keys will be a string
- values will be a list of strings

{'gmail.com': ['abby', 'rachel']
 'yahoo.com': ['kumar']
}

def email_hosts(emails):
    hosts = {}
    for email in emails:
        # parse the email address to find the username part and the host part
        at = email.find('@')
        username = email[:at]
        host = email[at + 1:]
        # rest of code here
        pass
    return hosts

def email_hosts(emails):
    hosts = {}
    for email in emails:
        # parse the email address to find the username part and the host part
        at = email.find('@')
        username = email[:at]
        host = email[at + 1:]
        # initialize entry
        if host not in hosts:
            hosts[host] = []
        # increment/append
        users = hosts[host]
        users.append(username)
    return hosts

lets look at this portion carefully:

# increment/append
users = hosts[host]
users.append(username)

we could also do this in one step

hosts[host].append(username)

def email_hosts(emails):
    hosts = {}
    for email in emails:
        # parse the email address to find the username part and the host part
        at = email.find('@')
        username = email[:at]
        host = email[at + 1:]
        # initialize entry
        if host not in hosts:
            hosts[host] = []
        # increment/append
        users = hosts[host]
        users.append(username)
    return hosts

email_hosts(['[email protected]', '[email protected]', '[email protected]', '[email protected]'])

    {'gmail.com': ['abby', 'rachel'],
     'yahoo.com': ['kumar'],
     'byu.edu': ['zappala']}

a sequence that shows all the steps for adding a new username to the dictionary, starting with an empty dictionary and then ending with a key for 'gmail.com' that maps to a list that conains 'abby'

Example — food ratings

we have a list of anonymous food ratings:

['donut:10', 'apple:8', 'donut:9', 'apple:6', 'donut:7']

build a dictionary that lists all the ratings for the same food

{
   'donut': [10, 9, 7],
   'apple': [8, 6]
}

def food_ratings(ratings):
    foods = {}
    for food_rating in ratings:
        at = food_rating.find(':')
        food = food_rating[:at]
        rating = food_rating[at + 1:]
        # convert to integer
        rating = int(rating)
        # initialize entry
        if food not in foods:
            foods[food] = []
        # increment/append
        foods[food].append(rating)
    return foods

food_ratings(['donut:10', 'apple:8', 'donut:9', 'apple:6', 'donut:7', 'dr. zappalas lasagna:100'])

    {'donut': [10, 9, 7], 'apple': [8, 6], 'dr. zappalas lasagna': [100]}

Example — census names

we want to store both the last name and first name in the dictionary
a single person:

['Zappala', 'Anna']

a list of people:

[['Zappala', 'Mariano'], ['Zappala', 'Anna']]

a list of lists!

a dictionary that uses a census to create a mapping from each age group to a list of people who are that age

def people_by_age(filename):
    people = {}
    with open(filename) as file:
        for line in file:
            last, first, relationship, gender, race, age, marital_status = line.strip().split(',')
            age = int(age)
            # rounds to nearest 10s
            age_group = round_to_nearest_10(age)
            # initialize a new entry
            if age_group not in people:
                people[age_group] = []
            # append a new person
            people[age_group].append([last, first])
    return people

people_by_age('census.txt')

    {50: [['Baer', 'William'], ['Sposato', 'Carolina']],
     30: [['Baer', 'Ruth']],
     10: [['Baer', 'Robert'],
      ['Baer', 'William'],
      ['Sposato', 'Antonio'],
      ['Sposato', 'Ralph']],
     20: [['Sposato', 'Albert'],
      ['Sposato', 'Carlo'],
      ['Sposato', 'Frances'],
      ['Zappala', 'Mariano'],
      ['Zappala', 'Anna']]}

Dictionaries vs lists

lists are for when you want to store a set of things
you can directly access each item with an index, which is always an integer starting at 0
often want to access all of them (e.g. with a for loop)

a list of names

dictionaries are for when you want to map a key to a value
you can directly access each item with a key, which an be any integer or string you choose
often want to access one at a time (e.g. look up the total number of 20-year-olds in the census)

a dictionary that uses a census to create a mapping from each age group to the total number of people who are that age

can combine these!
a dictionary that holds a list of lists

a dictionary that uses a census to create a mapping from each age group to a list of people who are that age

A dictionary of dictionaries

We spent some time in class talking about how the entries in a dictionary can be … a dictionary. See the below code, which creates a dictionary of people in the census.

def dictionary_of_people_by_age(filename):
    """
    Create a dictionary of people by age. The keys are age group, and the values
    are a dictionary that contains last name, first name, gender, and age.

    :param filename: a file that contains census data
    :return: a dictionary as described above
    >>> dictionary_of_people_by_age('census.txt')
    {50: [{'last': 'Baer', 'first': 'William', 'gender': 'M', 'age': 51}, {'last': 'Sposato', 'first': 'Carolina', 'gender': 'F', 'age': 53}], 30: [{'last': 'Baer', 'first': 'Ruth', 'gender': 'F', 'age': 38}], 10: [{'last': 'Baer', 'first': 'Robert', 'gender': 'M', 'age': 12}, {'last': 'Baer', 'first': 'William', 'gender': 'M', 'age': 10}, {'last': 'Sposato', 'first': 'Antonio', 'gender': 'M', 'age': 18}, {'last': 'Sposato', 'first': 'Ralph', 'gender': 'M', 'age': 10}], 20: [{'last': 'Sposato', 'first': 'Albert', 'gender': 'M', 'age': 23}, {'last': 'Sposato', 'first': 'Carlo', 'gender': 'M', 'age': 21}, {'last': 'Sposato', 'first': 'Frances', 'gender': 'F', 'age': 28}, {'last': 'Zappala', 'first': 'Mariano', 'gender': 'M', 'age': 27}, {'last': 'Zappala', 'first': 'Anna', 'gender': 'F', 'age': 25}]}
    """
    people = {}
    with open(filename) as file:
        for line in file:
            last, first, relationship, gender, race, age, marital_status = line.strip().split(',')
            age = int(age)
            age_group = round_to_nearest_10(age)
            # initialize a new entry
            if age_group not in people:
                people[age_group] = []
            # append a new person
            people[age_group].append({'last': last, 'first': first, 'gender': gender, 'age': age})
    return people