Dictionaries

Another useful data type built into Python is the dictionary. A dictionary is like a list, but it is more general. In a list, the indices have to be integers; in a dictionary they can be any type.

Key:value pair

You can think of a dictionary as a mapping between two things:
  • keys: a set of indices,
  • values: a set of values corresponding to each key.

Each key maps to a value. The association of a key and a value pair is called a key:value pair.

You can define an empty dictionary in two ways. One way is to use a built-in function dict:

>>> eng2kor = dict()

or altenatively, use an empty squiggly-brackets, {}:

>>> eng2kor = {}

In both cases you see the following:

>>> type(eng2kor)
<type 'dict'>
>>> print eng2kor
{}

Let’s add a new pair to the dictionary, eng2kor. To add one, you can use square brackets:

>>> eng2kor['one'] = 'hana'

This creates a new key:value pair that maps from the key one to the value hana. If we print the dictionary again:

>>> print eng2kor    # or simply >>> eng2kor
{'one': 'hana'}

One can add multiple pairs using this output format:

>>> eng2kor={'one':'hana','two':'dool','three':'set','four':'net'}

Unfortunately, append method cannot be invoked on a dictionary directly (i.e., eng2kor.append('five') won’t work). Instead, one can keep adding new pairs using:

>>> eng2kor['five'] = 'dasut'

or using update (try help(dict) or dir(dict) to see more options):

>>> eng2kor.update({'five':'dasut'})

Let’s now print to see what we have defined so far:

>>> print eng2kor
{'four': 'net', 'three': 'set', 'five': 'dasut', 'two': 'dool', 'one': 'hana'}

The order of the key:value pairs does not look like what you might have expected. In fact, they might look different on different computers. Surprisingly, the order of pairs in a dictionary is unpredictable. This is because the elements of a dictionary are never indexed with integer indices (they are still iterable though). Even though it might look confusing, this is not a problem as long as the one-to-one correspondance between the key:value relationships remain unchanged, which is the case all the time:

>>> eng2kor['five']
'dasut'

>>> eng2kor['two']
'dool'

or traversing through the dictionary will show:

>>> for i in eng2kor:
...     print i
...
four
three
five
two
one

Here we see that traversing a dictionary is executed among the key lists, not the value lists.

We can apply some of the methods we learned so far to a dictionary:

>>> len(eng2kor)
5

>>> 'one' in eng2kor
True

>>> 'net' in eng2kor
False

The second example of the in operator tells us that Python checks if the search word appears as a key, but not as a value in the dictionary.

To see whether something appears as a value instead of a key, is to use the method values which returns the values as a list:

>>> print eng2kor.values()
['net', 'set', 'dasut', 'dool', 'hana']

With this we can now search:

>>> vals = eng2kor.values()
>>> 'net' in vals
True

We can also compare between keys or between values:

>>> eng2kor.keys()
['four', 'three', 'five', 'two', 'one']

>>> eng2kor.keys()[0].__gt__(eng2kor.keys()[2])   # this is equivalent to eng2kor.keys()[0] > eng2kor.keys()[2]
True

Comparing the corresponding values (i.e., the two corresponding value elements to the 0th and 2nd key elements):

>>> eng2kor.values()[0] > eng2kor.values()[2]
True

Dictionary as a set of counters

Consider that you are given a string and you wish to count how many times each character appears. Recall that we did this before using a list Searching and counting. This time let’s use a dictionary and see how we can implement a more general algorithm:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
"""
/lectureNote/chapters/chapt03/codes/examples/dictionaries/histogram.py

"""

def histogram(s):
    # initialize with an empty dictionary
    d = dict()
    
    for c in s:
        if c not in d:
            # if c first appears as a key in d
            # then initialize its value to one.
            d[c] = 1
        else:
            # if c appears as a key more than once
            # add its value by one.
            d[c] += 1

    # return dictionary        
    return d



def histogram_ternary(s):
    
    # This is exactly the same as histogram
    # but using a so-called 'ternary operator':
    # a if test else b
    #
    # Ex: x='apple' if a > 2 else 'orange'
    # Translating this into English will be
    # x is 'apple' if a > 2; otherwise x is 'orange'

    d = dict()
    for c in s:
        # the ternary expression is much shorter
        # than the conventional if-else statement
        # with a reduced readability.
        d[c] = 1 if c not in d else d[c]+1
    return d



def print_hist(h):
    for c in h:
        # print key and value
        print c,h[c]




if __name__ == "__main__":

    # first function 
    h1=histogram('apple')
    print '(a):', h1

    # second function which uses the ternary operator
    h2=histogram_ternary('apple')
    print '(b):', h2

    # are they the same?
    print '(c):', h1 is h2
    print '(d):', id(h1)
    print '(e):', id(h2)

    # print keys
    h1_keys = h1.keys()
    h2_keys = h2.keys()
    print '(f):', h1_keys
    print '(g):', h1_keys is h2_keys

    # does 'a' appear as a key?
    print '(h):', h1_keys.__contains__('a')

    # print values
    h1_values = h1.values()
    h2_values = h2.values()
    print '(i):', h1_values
    print '(j):', h1_values is h2_values

    # does '0' appear as a value? 
    print '(k):', h1_values.__contains__('0')

    # 'get' takes a key and a default value
    # If the key appears in the dictionary
    # 'get' returns the corresponding value;
    # otherwise it returns the user defined
    # default value, e.g., 159 in the following example:
    print '(l):', h1.get('a',159)
    print '(m):', h1.get('w',159)


    # print histogram
    print '(n):--------'
    print_hist(h1)

Running this in the script mode will give:

$ python histogram.py
(a): {'a': 1, 'p': 2, 'e': 1, 'l': 1}
(b): {'a': 1, 'p': 2, 'e': 1, 'l': 1}
(c): False
(d): 4303246232
(e): 4303246512
(f): ['a', 'p', 'e', 'l']
(g): False
(h): True
(i): [1, 2, 1, 1]
(j): False
(k): False
(l): 1
(m): 159
(n):--------
a 1
p 2
e 1
l 1

Dictionaries and lists

Lists can only appear as values in a dictionary, but not keys. For example, if you try:

>>> t=['a','e','l']
>>> type(t)
<type 'list'>
>>> d = dict()
>>> d[t]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

>>> d['1'] = t
>>> d
{'1': ['a','e','l']}

The above example confirms that lists can only be used as values. Now, let’s consider an application of using lists as values. Take a look at what we just obtained in the last outcome, {'1': ['a','e','l']}. This looks like an inverse map of the output (a) or (b)! This example tells us that we may implement an inverse map routine which inverts keys and values in a dictionary. Here is a function that inversts a dictionary:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
"""
/lectureNote/chapters/chapt03/codes/examples/dictionaries/invert_dictionary.py

"""

def invert_dictionary(d):
    # create an empty dictionary
    inverse = dict()

    # traverse through keys in dictionary "d"
    for key in d:

        # "val" is a "value" of "d" associated with a "key"
        val = d[key]

        # if val is first found as a key in inverse
        # create a new val:key pair in inverse 
        if val not in inverse:
            inverse[val] = [key]
            # Note above that the values of inverse is assigned as a list, [key]
            
        # if val is already found, append the corresponding
        # key to the list
        else:
            #print val, inverse[val]
            #print type(inverse[val])
            inverse[val].append(key)

    # output inverse dictionary
    return inverse


if __name__ == "__main__":

    # import histogram method from histogram.py
    from histogram import histogram as histo

    # compute histogram
    hist = histo('apple')
    print hist

    # compute inverse map of dictionary
    inv = invert_dictionary(hist)
    print inv

    # what is hist(hist^{-1}('apple'))?
    h1 = histo(inv)
    print h1

    # what is hist^{-1}(hist(hist^{-1}('apple')))?
    h2 = invert_dictionary(h1)
    print h2

The result look like:

$ python invert_dictionary.py
{'a': 1, 'p': 2, 'e': 1, 'l': 1}
{1: ['a', 'e', 'l'], 2: ['p']}
{1: 1, 2: 1}
{1: [1, 2]}

Note

Why are we getting the last two outcomes?

Dictionaries as memos

A dictionary can be used for storing quantities that have been already computed, and thereby one doesn’t need to repeat such previously computed operations. In computing this clearly allows a faster performance by efficiently reusing the stored data.

For example, a straightforward implementation of Fibonacci sequence can look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
"""
/lectureNote/chapters/chapt03/codes/examples/dictionaries/fibonacci.py

Fibonaci sequence using recursion

"""

def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        res = fibonacci(n-1) + fibonacci(n-2)
        return res
    
if __name__ == "__main__":
    fib_numb = fibonacci(12)
    print fib_numb

Notice now how many times those terms that appear early in the sequence are recursively called repeatedly – very many! A dictionary can be used in this case to keep track of the terms that have been already evaluated and store them in a dictionary:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
"""
/lectureNote/chapters/chapt03/codes/examples/dictionaries/fibonacci_dict.py

Fibonaci sequence using a dictionary "known" which keeps track of values that
have already been computed and stores them for reuse.

"""

# initialize a dictionary, known, with the first two sequences: F0=0, F1=1

known = {0:0,1:1}

def fibonacci_dict(n):
    #global known
    # check if n already appears as key in known dictionary
    if n in known:
        # if true return the corresponding value
        return known[n]
    else:
        # otherwise, calculate a new Fibonacci number
        # and add the new Fibonacci number as a new value in the dictionary, known
        known[n] = fibonacci_dict(n-1) + fibonacci_dict(n-2)
        return known[n]
    
if __name__ == "__main__":
    print fibonacci_dict(12)
    print known

In the above example, known is a dictionary that stores the Fibonacci numbers we already know. It starts with the first two terms in the sequence: F0=0 and F1=1, or in other words, 0 maps to 0 and 1 maps to 1.

To compare CPU runtimes in seconds, we can do as follow:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
"""
/lectureNote/chapters/chapt03/codes/examples/dictionaries/run_fibonacci.py

Runtime comparison of the two fibonacci implementations of using recursive and dictionary. 

"""

import time
from fibonacci import fibonacci
from fibonacci_dict import fibonacci_dict

start_time1 = time.time()
fibonacci(12)
elapsed_time1 = time.time() - start_time1

start_time2 = time.time()
fibonacci_dict(12)
elapsed_time2 = time.time() - start_time2

print 'Run time in seconds: Fibonacci & Fibonacci_dict = ', elapsed_time1, elapsed_time2

Global variables

In the previous example, known is initialized outside the function. Therefore, it belongs to the special frame called __main__. Variables in __main__ have their scopes globally because they can be accessed from any function.

In order to modify any mutable global variable, especially within a local function, you need to declare it before using it. The following example illustrates how the global variables behaves and how they should be modified in a local function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
"""
/lectureNote/chapters/chapt03/codes/examples/dictionaries/global.py

"""

been_called = False

def local_var():
    been_called = True
    print '(a):', been_called

local_var()
print '(b):', been_called


def global_var():
    global been_called
    been_called = True
    print '(c):', been_called


global_var()
print '(d):', been_called


been_called = False
def return_var():
    been_called = True
    return been_called

return_var()
print '(e):', been_called
print '(f):', return_var()

The result looks like:

(a): True
(b): False
(c): True
(d): True
(e): False
(f): True

An example study

Consider the following example which has been originally adopted from Dive Into Python 3 and modified for the class.

This routine takes a computer file size in kilobytes as an input and converts it approximately to a human-readable form, e.g., 1TB, or 931 GiB, etc.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
"""
/lectureNote/chapters/chapt03/codes/examples/dictionaries/humansize.py

NOTE: This routine has been extracted from

   http://www.diveintopython3.net/your-first-python-program.html
   
and modified by Prof. Dongwook Lee for AMS 209.

"""

SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
            1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}

def approximate_size(size, a_kilobyte_is_1024_bytes=True):
    '''Convert a file size to human-readable form.

    Keyword arguments:
    size -- file size in bytes
    a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024
                                if False, use multiples of 1000

    Returns: file size in a string format
    
    '''
    if size < 0:
        print 'number must be non-negative'

    if a_kilobyte_is_1024_bytes:
        multiple = 1024
    else:
        multiple = 1000

    # Initialize an empty size_dict array to keep track of
    # the file sizes and suffixes.
    # The result is goint to be the last key:value pair when
    # a computed size becomes smaller than the file size unit (i.e., multiple).
    size_dict=dict()
    for suffix in SUFFIXES[multiple]:

        size /= multiple
        size_dict[size]=suffix
        
        # Keep dividing until a size is less than the chosen file size unit
        if size < multiple:
            return str(size) + ' ' + size_dict[size]
            
    print 'number too large'

if __name__ == '__main__':
    print '(a) with the multiple of 1000 bytes: ', approximate_size(1000000000000, False)
    print '(b) with the multiple of 1024 bytes: ', approximate_size(1000000000000)

The output from running the routine looks like:

$ python humansize.py
(a) with the multiple of 1000 bytes:  1 TB
(b) with the multiple of 1024 bytes:  931 GiB