Dictionaries

Another useful data type built into Python is the dictionary. A dictionary is like a list, but it is more general. In a list, the indices have to be integers; in a dictionary they can be any type.

Key:value pair

You can think of a dictionary as a mapping between two things:
  • keys: a set of indices,
  • values: a set of values corresponding to each key.

Each key maps to a value. The association of a key and a value pair is called a key:value pair.

You can define an empty dictionary in two ways. One way is to use a built-in function dict:

>>> eng2kor = dict()

or altenatively, use an empty squiggly-brackets, {}:

>>> eng2kor = {}

In both cases you see the following:

>>> type(eng2kor)
<type 'dict'>
>>> print eng2kor
{}

Let’s add a new pair to the dictionary, eng2kor. To add one, you can use square brackets:

>>> eng2kor['one'] = 'hana'

This creates a new key:value pair that maps from the key one to the value hana. If we print the dictionary again:

>>> print eng2kor    # or simply >>> eng2kor
{'one': 'hana'}

One can add multiple pairs using this output format:

>>> eng2kor={'one':'hana','two':'dool','three':'set','four':'net'}

Unfortunately, append method cannot be invoked on a dictionary directly (i.e., eng2kor.append('five') won’t work). Instead, one can keep adding new pairs using:

>>> eng2kor['five'] = 'dasut'

or using update (try help(dict) or dir(dict) to see more options):

>>> eng2kor.update({'five':'dasut'})

Let’s now print to see what we have defined so far:

>>> print eng2kor
{'four': 'net', 'three': 'set', 'five': 'dasut', 'two': 'dool', 'one': 'hana'}

The order of the key:value pairs does not look like what you might have expected. In fact, they might look different on different computers. Surprisingly, the order of pairs in a dictionary is unpredictable. This is because the elements of a dictionary are never indexed with integer indices (they are still iterable though). Even though it might look confusing, this is not a problem as long as the one-to-one correspondance between the key:value relationships remain unchanged, which is the case all the time:

>>> eng2kor['five']
'dasut'

>>> eng2kor['two']
'dool'

or traversing through the dictionary will show:

>>> for i in eng2kor:
...     print i
...
four
three
five
two
one

Here we see that traversing a dictionary is executed among the key lists, not the value lists.

We can use the iterators that are defined as methods in dictionary (try help(dict) and find these), iteritems(), iterkeys(), and itervalues():

>>> for eng, kor in eng2kor.iteritems():
...     print eng, kor
...
four net
one hana
five dasut
three set
two dool

or, to just get keys:

>>> for eng in eng2kor.iterkeys():
...     print eng
...
four
one
five
three
two

Similarly, to just get values:

>>> for kor in eng2kor.itervalues():
...     print kor
...
net
hana
dasut
set
dool

The method items() defined in dictionary changes the dictionary to a list with (key,value) pairs as tuples:

>>> eng2kor.items()
[('four', 'net'), ('one', 'hana'), ('five', 'dasut'), ('three', 'set'), ('two', 'dool')]

We can apply some of the methods we learned so far to a dictionary:

>>> len(eng2kor)
5

>>> 'one' in eng2kor
True

>>> 'net' in eng2kor
False

The second example of the in operator tells us that Python checks if the search word appears as a key, but not as a value in the dictionary.

To see whether something appears as a value instead of a key, is to use the method values which returns the values as a list:

>>> print eng2kor.values()
['net', 'set', 'dasut', 'dool', 'hana']

With this we can now search:

>>> vals = eng2kor.values()
>>> 'net' in vals
True

We can also compare between keys or between values:

>>> eng2kor.keys()
['four', 'three', 'five', 'two', 'one']

>>> eng2kor.keys()[0].__gt__(eng2kor.keys()[2])   # this is equivalent to eng2kor.keys()[0] > eng2kor.keys()[2]
True

Comparing the corresponding values (i.e., the two corresponding value elements to the 0th and 2nd key elements):

>>> eng2kor.values()[0] > eng2kor.values()[2]
True

Dictionary as a set of counters

Consider that you are given a string and you wish to count how many times each character appears. Recall that we did this before using a list Searching and counting. This time let’s use a dictionary and see how we can implement a more general algorithm:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
"""
/lectureNote/chapters/chapt03/codes/examples/dictionaries/histogram.py

"""

def histogram(s):
    # initialize with an empty dictionary
    d = dict()
    
    for c in s:
        if c not in d:
            # if c first appears as a key in d
            # then initialize its value to one.
            d[c] = 1
        else:
            # if c appears as a key more than once
            # add its value by one.
            d[c] += 1

    # return dictionary        
    return d



def histogram_ternary(s):
    
    # This is exactly the same as histogram
    # but using a so-called 'ternary operator':
    # a if test else b
    #
    # Ex: x='apple' if a > 2 else 'orange'
    # Translating this into English will be
    # x is 'apple' if a > 2; otherwise x is 'orange'

    d = dict()
    for c in s:
        # the ternary expression is much shorter
        # than the conventional if-else statement
        # with a reduced readability.
        d[c] = 1 if c not in d else d[c]+1
    return d


def histogram_ternary_get(s):
    
    # This is exactly the same as histogram
    # but using 'get' method defined in dictionary:
    # See help(dict) and check out:
    #
    # get(...)
    # D.get(k[,d]) -> D[k] if k in D, else d.  d defaults to None.
    # i.e., if D is a dictionary, 
    #                    /D[k] if k in D
    #      D.get(k,d) = |
    #                   \ d if k not in D
    #
    # Ex: x='apple' if a > 2 else 'orange'
    # Translating this into English will be
    # x is 'apple' if a > 2; otherwise x is 'orange'

    d = dict()
    for c in s:
        # the ternary expression is much shorter
        # than the conventional if-else statement
        # with a reduced readability.
        #d[c] = 1 if c not in d else d[c]+1
        d[c] = d.get(c,0) + 1
    return d


def print_hist(h):
    for c in h:
        # print key and value
        print c,h[c]




if __name__ == "__main__":

    # first function 
    h1=histogram('apple')
    print '(a):', h1

    # second function which uses the ternary operator
    h2=histogram_ternary('apple')
    #h2=histogram_ternary_get('apple')
    print '(b):', h2

    # are they the same?
    print '(c):', h1 is h2
    print '(d):', id(h1)
    print '(e):', id(h2)

    # print keys
    h1_keys = h1.keys()
    h2_keys = h2.keys()
    print '(f):', h1_keys, id(h1_keys)
    print '(ff)', h2_keys, id(h2_keys)
    print '(g):', h1_keys is h2_keys

    # does 'a' appear as a key?
    print '(h):', h1_keys.__contains__('a')

    # print values
    h1_values = h1.values()
    h2_values = h2.values()
    print '(i):', h1_values
    print '(j):', h1_values is h2_values

    # does '0' appear as a value? 
    print '(k):', h1_values.__contains__('0')

    # 'get' takes a key and a default value
    # If the key appears in the dictionary
    # 'get' returns the corresponding value;
    # otherwise it returns the user defined
    # default value, e.g., 159 in the following example:
    print '(l):', h1.get('a',159)
    print '(m):', h1.get('w',159)


    # print histogram
    print '(n):--------'
    print_hist(h1)

Running this in the script mode will give:

$ python histogram.py
(a): {'a': 1, 'p': 2, 'e': 1, 'l': 1}
(b): {'a': 1, 'p': 2, 'e': 1, 'l': 1}
(c): False
(d): 4303246232
(e): 4303246512
(f): ['a', 'p', 'e', 'l']
(g): False
(h): True
(i): [1, 2, 1, 1]
(j): False
(k): False
(l): 1
(m): 159
(n):--------
a 1
p 2
e 1
l 1

Dictionaries and lists

Lists can only appear as values in a dictionary, but not keys. For example, if you try:

>>> t=['a','e','l']
>>> type(t)
<type 'list'>
>>> d = dict()
>>> d[t]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

>>> d['1'] = t
>>> d
{'1': ['a','e','l']}

The above example confirms that lists can only be used as values. Now, let’s consider an application of using lists as values. Take a look at what we just obtained in the last outcome, {'1': ['a','e','l']}. This looks like an inverse map of the output (a) or (b)! This example tells us that we may implement an inverse map routine which inverts keys and values in a dictionary. Here is a function that inversts a dictionary:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
"""
/lectureNote/chapters/chapt03/codes/examples/dictionaries/invert_dictionary.py

"""

def invert_dictionary(d):
    # create an empty dictionary
    inverse = dict()

    # traverse through keys in dictionary "d"
    for key in d:

        # "val" is a "value" of "d" associated with a "key"
        val = d[key]

        # if val is first found as a key in inverse
        # create a new val:key pair in inverse 
        if val not in inverse:
            inverse[val] = [key]
            # Note above that the values of inverse is assigned as a list, [key]
            
        # if val is already found, append the corresponding
        # key to the list
        else:
            #print val, inverse[val]
            #print type(inverse[val])
            inverse[val].append(key)

    # output inverse dictionary
    return inverse


if __name__ == "__main__":

    # import histogram method from histogram.py
    from histogram import histogram as histo

    # compute histogram
    hist = histo('apple')
    print hist

    # compute inverse map of dictionary
    inv = invert_dictionary(hist)
    print inv

    # what is hist(hist^{-1}('apple'))?
    h1 = histo(inv)
    print h1

    # what is hist^{-1}(hist(hist^{-1}('apple')))?
    h2 = invert_dictionary(h1)
    print h2

The result look like:

$ python invert_dictionary.py
{'a': 1, 'p': 2, 'e': 1, 'l': 1}
{1: ['a', 'e', 'l'], 2: ['p']}
{1: 1, 2: 1}
{1: [1, 2]}

Note

Why are we getting the last two outcomes?

Dictionaries as memos

A dictionary can be used for storing quantities that have been already computed, and thereby one doesn’t need to repeat such previously computed operations. In computing, this clearly allows a faster performance by efficiently reusing the stored data.

For example, a straightforward implementation of Fibonacci sequence can look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
"""
/lectureNote/chapters/chapt03/codes/examples/dictionaries/fibonacci.py

Fibonaci sequence using recursion

"""

def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        res = fibonacci(n-1) + fibonacci(n-2)
        return res
    
if __name__ == "__main__":
    fib_numb = fibonacci(12)
    print fib_numb

Notice now how many times those terms that appear early in the sequence are recursively called repeatedly – very many! A dictionary can be used in this case to keep track of the terms that have been already evaluated and store them in a dictionary:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
"""
/lectureNote/chapters/chapt03/codes/examples/dictionaries/fibonacci_dict.py

Fibonaci sequence using a dictionary "known" which keeps track of values that
have already been computed and stores them for reuse.

"""

# initialize a dictionary, known, with the first two sequences: F0=0, F1=1

known = {0:0,1:1}

def fibonacci_dict(n):
    # global known
    # check if n already appears as key in known dictionary
    if n in known:
        # if true return the corresponding value
        return known[n]
    else:
        # otherwise, calculate a new Fibonacci number
        # and add the new Fibonacci number as a new value in the dictionary, known
        known[n] = fibonacci_dict(n-1) + fibonacci_dict(n-2)
        return known[n]
    
if __name__ == "__main__":
    print fibonacci_dict(12)
    print known

In the above example, known is a dictionary that stores the Fibonacci numbers we already know. It starts with the first two terms in the sequence: F0=0 and F1=1, or in other words, 0 maps to 0 and 1 maps to 1.

To compare CPU runtimes in seconds, we can do as follow:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
"""
/lectureNote/chapters/chapt03/codes/examples/dictionaries/run_fibonacci.py

Runtime comparison of the two fibonacci implementations of using recursive and dictionary. 

"""

import time
from fibonacci import fibonacci
from fibonacci_dict import fibonacci_dict

n=30
start_time1 = time.time()
fibonacci(n)
elapsed_time1 = time.time() - start_time1

start_time2 = time.time()
fibonacci_dict(n)
elapsed_time2 = time.time() - start_time2

print 'Run time in seconds: Fibonacci & Fibonacci_dict = ', elapsed_time1, elapsed_time2

Global variables

In the previous example, known is initialized outside the function. Therefore, it belongs to the special frame called __main__. Variables in __main__ have their scopes globally because they can be accessed from any function.

In order to modify any mutable global variable, especially within a local function, you need to declare it before using it. The following example illustrates how the global variables behaves and how they should be modified in a local function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
"""
/lectureNote/chapters/chapt03/codes/examples/dictionaries/global.py

"""

been_called = False

def local_var():
    been_called = True
    print '(a):', been_called

local_var()
print '(b):', been_called


def global_var():
    global been_called
    been_called = True
    print '(c):', been_called


global_var()
print '(d):', been_called


been_called = False
def return_var():
    been_called = True
    return been_called

return_var()
print '(e):', been_called
print '(f):', return_var()

The result looks like:

(a): True
(b): False
(c): True
(d): True
(e): False
(f): True

An example study

Consider the following example which has been originally adopted from Dive Into Python 3 and modified for the class.

This routine takes a computer file size in kilobytes as an input and converts it approximately to a human-readable form, e.g., 1TB, or 931 GiB, etc.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
"""
/lectureNote/chapters/chapt03/codes/examples/dictionaries/humansize.py

NOTE: This routine has been extracted from

   http://www.diveintopython3.net/your-first-python-program.html
   
and modified by Prof. Dongwook Lee for AMS 209.

"""

SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
            1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}

def approximate_size(size, a_kilobyte_is_1024_bytes=True):
    '''Convert a file size to human-readable form.

    Keyword arguments:
    size -- file size in bytes
    a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024
                                if False, use multiples of 1000

    Returns: file size in a string format
    
    '''
    if size < 0:
        print 'number must be non-negative'

    if a_kilobyte_is_1024_bytes:
        multiple = 1024
    else:
        multiple = 1000

    # Initialize an empty size_dict array to keep track of
    # the file sizes and suffixes.
    # The result is going to be the last key:value pair when
    # a computed size becomes smaller than the file size unit (i.e., multiple).
    size_dict=dict()
    for suffix in SUFFIXES[multiple]:
        #print suffix
        size /= multiple  # <==> size = size/multiple
        #print size
        size_dict[size]=suffix
        
        # Keep dividing until a size is less than the chosen file size unit
        if size < multiple:
            return str(size) + ' ' + size_dict[size]
            
    print 'number too large'

if __name__ == '__main__':
    print '(a) with the multiple of 1000 bytes: ', approximate_size(1000000000000, False)
    print '(b) with the multiple of 1024 bytes: ', approximate_size(1000000000000)

The output from running the routine looks like:

$ python humansize.py
(a) with the multiple of 1000 bytes:  1 TB
(b) with the multiple of 1024 bytes:  931 GiB