BYU logo Computer Science

More strings

Concepts from last time

  • strings indexed starting from zero
  • get a character with square brackets s[0] or s[i]
  • len(s) — length of a string
  • various string functions like s.isalpha()

string indexing

str_dx(s) and the accumulator pattern

def str_dx(s):
    result = ''
    for i in range(len(s)):
        if s[i].isdigit():
            result += 'd'
        else:
            result += 'x'
    return result

str_dx("I'm 91. You call me old??! I'm wise and experienced!")
    'xxxxddxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'

count_e(s) problem and the accumulator pattern

  • count the number of ‘e’ characters in a string
  • accumulator pattern using an integer variable
def count_e(s):
    count = 0
    for i in range(len(s)):
        if s[i] == 'e':
            # count = count + 1
            count += 1
    return count


result = count_e('abcdefg')
print(result)
result = count_e('abcdefgeabce')
print(result)
    1
    3

has_alpha(s) function, early return pattern, doctests

def has_alpha(s):
    """
    Returns true if there are any alphabetic characters in the string, false otherwise.

    :param s: a string
    :return: True if there are alphabetic characters in s, otherwise False
    >>> has_alpha('45#a8e')
    True
    >>> has_alpha('45^#)-')
    False
    """
    for i in range(len(s)):
        if s[i].isalpha():
            return True
    return False

if statements

if something_is_true:
    do_things()
else:
    do_other things()

if not something_is_true:
    do_things()

if first_check:
    do_first_things()
elif second_check:
    do_second_things()
elif third_check:
    do_third_things()
else:
    do_fourth_things()

in test

  • check for presence of a string in another string
  • also a not in variant
'Dog' in 'CatDogBird'

'dog' in 'CatDogBird'   # upper vs. lower case

'd' in 'CatDogBird'     # finds d at the end

'i' in 'CatDogBird'     # finds lower case characters

'x' in 'CatDogBird'     # returns false if not found

'x' not in 'CatDogBird' # also have a "not in" variant

s = 'my birthday party'
if 'birth' in s:
    print('birth is in this string')

has_pi(s) function

def has_pi(s):
    """ return true if "3" and "14" are in the string (not necessarily next to each other)
    """
    if '3' in s and '14' in s:
        return True
    return False

print(has_pi('3.1415'))
print(has_pi('Today is the 3rd time in 14 days that I have slept until 10am.'))
print(has_pi('Which 3 players are your favorite?'))
    True
    True
    False

find_cat(s) function

  • look for instances of ‘c’, ‘a’, ‘t’, ‘C’, ‘A’, ‘T’
  • return a new string that has just these characters, in the order they appear
sreturn value
xCtxxxAaxCtAa
xaCxxxTxaCT

find_cat(s) attempt #1

  • misses the capital letters
def find_cat(s):
    """
    >>> find_cat('xCtxxxAax')
    'CtAa'
    >>> find_cat('xaCxxxTx')
    aCT
    """
    result = ''
    for i in range(len(s)):
        if s[i] == 'c' or s[i] == 'a' or s[i] == 't':
            result += s[i]
    return result


print(find_cat('xCtxxxAax'))
print(find_cat('xaCxxxTx'))
    ta
    a

find_cat(s) attempt #2

  • do 6 comparisons!
  • works but it is ugly

find_cat(s) attempt #3

  • convert character to lowercase first (doesn’t modify the original string)
  • makes it easier to test without having to write 6 comparisons
  • still a long comparison
def find_cat(s):
    result = ''
    for i in range(len(s)):
        if s[i].lower() == 'c' or s[i].lower() == 'a' or s[i].lower() == 't':
            result += s[i]
    return result


print(find_cat('xCtxxxAax'))
print(find_cat('xaCxxxTx'))

find_cat(s) attempt #3

  • decompose with a varible
  • compute s[i].lower() once, store it in a variable
  • avoids repetition
  • code is easier to read
  • decomposition within a function — do small pieces at a time
def find_cat(s):
    result = ''
    for i in range(len(s)):
        low = s[i].lower()   # decompose with a variable
        if low == 'c' or low == 'a' or low == 't':
            result += s[i]
    return result


print(find_cat('xCtxxxAax'))
print(find_cat('xaCxxxTx'))
    CtAa
    aCT

a note on variable names

  • don’t use ‘lower’ — duplicates a function name
  • could use ‘lower_case_char’ — something descriptive
  • use underscores between words

find_cat(s) attempt #4

  • can make this a little bit simpler
  • instead of three comparisons, use low in 'cat'
def find_cat(text):
    result = ''
    for index in range(len(text)):
        low = text[index].lower()
        if low == 'c' or low == 'a' or low == 't':
        if low in 'cat':
            result += text[index]
    return result


print(find_cat('xCtxxxAax'))
print(find_cat('xaCxxxTx'))
    CtAa
    aCT

s.find(target)

  • find a target string inside of s
  • returns the index where it is found
  • returns -1 if not found
s = 'Python'
print('th: ',s.find('th'))
print('o: ',s.find('o'))
print('y: ',s.find('y'))
print('x: ', s.find('x'))
print('N: ',s.find('N'))
print('P: ',s.find('P'))
    th:  2
    o:  4
    y:  1
    x:  -1
    N:  -1
    P:  0

String slicing

  • get a substring from a string
  • used very heavily
  • use brackets and then start_index:end_index
  • returned string is from start_index to end_index-1
s = 'cats and dogs'

s[0:4]
'cats'

s[2:6]
'ts a'

s[9:13]
'dogs'

can leave off either the start or the end indexing

  • if you leave off the start, it starts at the start (0)
  • if you leave off the end, it ends at the end (len(s))
s = 'cats and dogs'

s[0:4]
s[:4]
'cats'

s[9:13]
s[9:]
'dogs'

edge cases


s = 'cats and dogs'

s[3:3]   # starting and ending the same -> empty string
''

s[9:999] # ending too big -> end
'dogs'

s[:]     # the whole string
'cats and dogs'

s[5:-2]  # negative numbers at the end counts backward from the end
'and do'
''

brackets(s) problem

  • look for a pair of brackets ’[…]’ within s
  • return the text between the brackets
  • if there are no brackets, return the empty string
  • assume if there is a left bracket, there will always be a right bracket after it
  • assume if brackets are present, there will be only one of each
sreturn value
‘cat[dog]bird’‘dog’
‘catdogbird’

draw it out!

  • find the index of the left bracket and the right bracket

brackets problem

brackets(s) solution

def brackets(s):
    left = s.find('[')
    if left == -1:
        return ''
    right = s.find(']')
    if right == -1:
        right = len(s)
    return s[left+1:right]


result = brackets('cat[dog]bird')
print(result)
result = brackets('catdogbird')
# this prints the empty string
print(f"[{result}]")
result = brackets('cat[dogbird')
print(result)
    dog
    []
    dogbird

s.find(target, start_index)

  • find target inside the string, but starting at start_index
s = '[xyz['

s.find('[')      # find first [
0

s.find('[', 1)   # start search at 1
4

parens(s) problem

  • like brackets, but use parentheses
  • may have other parentheses mixed in
  • write this in PyCharm, with doctests
  • can you help write the doctests?
sreturn value
‘x)x(abc)xxx’‘abc’

parens(s) solution

def parens(s):
    """
    Return the substring that is inside the first set of matching parentheses

    :param s: a string
    :return: a substring that is inside the first set of matching parentheses

    >>> parens('x)x(abc)xxx')
    'abc'
    """
    left = s.find('(')
    right = s.find(')', left + 1)
    return s[left+1:right]