Dictionaries¶
Another useful data type built into Python is the dictionary. A dictionary is like a list, but it is more general. In a list, the indices have to be integers; in a dictionary they can be any type.
Key:value pair¶
- You can think of a dictionary as a mapping between two things:
- keys: a set of indices,
- values: a set of values corresponding to each key.
Each key maps to a value. The association of a key and a value pair is called a key:value pair.
You can define an empty dictionary in two ways.
One way is to use a built-in function dict
:
>>> eng2kor = dict()
or altenatively, use an empty squiggly-brackets, {}
:
>>> eng2kor = {}
In both cases you see the following:
>>> type(eng2kor)
<type 'dict'>
>>> print eng2kor
{}
Let’s add a new pair to the dictionary, eng2kor
. To add one, you can
use square brackets:
>>> eng2kor['one'] = 'hana'
This creates a new key:value pair that maps from the key one
to the value
hana
. If we print the dictionary again:
>>> print eng2kor # or simply >>> eng2kor
{'one': 'hana'}
One can add multiple pairs using this output format:
>>> eng2kor={'one':'hana','two':'dool','three':'set','four':'net'}
Unfortunately, append
method cannot be invoked on a dictionary directly
(i.e., eng2kor.append('five')
won’t work).
Instead, one can keep adding new pairs using:
>>> eng2kor['five'] = 'dasut'
or using update
(try help(dict)
or dir(dict)
to see more options):
>>> eng2kor.update({'five':'dasut'})
Let’s now print to see what we have defined so far:
>>> print eng2kor
{'four': 'net', 'three': 'set', 'five': 'dasut', 'two': 'dool', 'one': 'hana'}
The order of the key:value pairs does not look like what you might have expected. In fact, they might look different on different computers. Surprisingly, the order of pairs in a dictionary is unpredictable. This is because the elements of a dictionary are never indexed with integer indices (they are still iterable though). Even though it might look confusing, this is not a problem as long as the one-to-one correspondance between the key:value relationships remain unchanged, which is the case all the time:
>>> eng2kor['five']
'dasut'
>>> eng2kor['two']
'dool'
or traversing through the dictionary will show:
>>> for i in eng2kor:
... print i
...
four
three
five
two
one
Here we see that traversing a dictionary is executed among the key lists, not the value lists.
We can use the iterators that are defined as methods in dictionary
(try help(dict)
and find these), iteritems()
, iterkeys()
, and itervalues()
:
>>> for eng, kor in eng2kor.iteritems():
... print eng, kor
...
four net
one hana
five dasut
three set
two dool
or, to just get keys:
>>> for eng in eng2kor.iterkeys():
... print eng
...
four
one
five
three
two
Similarly, to just get values:
>>> for kor in eng2kor.itervalues():
... print kor
...
net
hana
dasut
set
dool
The method items()
defined in dictionary changes the dictionary to
a list with (key,value)
pairs as tuples:
>>> eng2kor.items()
[('four', 'net'), ('one', 'hana'), ('five', 'dasut'), ('three', 'set'), ('two', 'dool')]
We can apply some of the methods we learned so far to a dictionary:
>>> len(eng2kor)
5
>>> 'one' in eng2kor
True
>>> 'net' in eng2kor
False
The second example of the in
operator tells us that Python checks if
the search word appears as a key
, but not as a value
in the dictionary.
To see whether something appears as a value instead of a key, is to use the method
values
which returns the values as a list:
>>> print eng2kor.values()
['net', 'set', 'dasut', 'dool', 'hana']
With this we can now search:
>>> vals = eng2kor.values()
>>> 'net' in vals
True
We can also compare between keys or between values:
>>> eng2kor.keys()
['four', 'three', 'five', 'two', 'one']
>>> eng2kor.keys()[0].__gt__(eng2kor.keys()[2]) # this is equivalent to eng2kor.keys()[0] > eng2kor.keys()[2]
True
Comparing the corresponding values (i.e., the two corresponding value elements to the 0th and 2nd key elements):
>>> eng2kor.values()[0] > eng2kor.values()[2]
True
Dictionary as a set of counters¶
Consider that you are given a string and you wish to count how many times each character appears. Recall that we did this before using a list Searching and counting. This time let’s use a dictionary and see how we can implement a more general algorithm:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 """ /lectureNote/chapters/chapt03/codes/examples/dictionaries/histogram.py """ def histogram(s): # initialize with an empty dictionary d = dict() for c in s: if c not in d: # if c first appears as a key in d # then initialize its value to one. d[c] = 1 else: # if c appears as a key more than once # add its value by one. d[c] += 1 # return dictionary return d def histogram_ternary(s): # This is exactly the same as histogram # but using a so-called 'ternary operator': # a if test else b # # Ex: x='apple' if a > 2 else 'orange' # Translating this into English will be # x is 'apple' if a > 2; otherwise x is 'orange' d = dict() for c in s: # the ternary expression is much shorter # than the conventional if-else statement # with a reduced readability. d[c] = 1 if c not in d else d[c]+1 return d def histogram_ternary_get(s): # This is exactly the same as histogram # but using 'get' method defined in dictionary: # See help(dict) and check out: # # get(...) # D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None. # i.e., if D is a dictionary, # /D[k] if k in D # D.get(k,d) = | # \ d if k not in D # # Ex: x='apple' if a > 2 else 'orange' # Translating this into English will be # x is 'apple' if a > 2; otherwise x is 'orange' d = dict() for c in s: # the ternary expression is much shorter # than the conventional if-else statement # with a reduced readability. #d[c] = 1 if c not in d else d[c]+1 d[c] = d.get(c,0) + 1 return d def print_hist(h): for c in h: # print key and value print c,h[c] if __name__ == "__main__": # first function h1=histogram('apple') print '(a):', h1 # second function which uses the ternary operator h2=histogram_ternary('apple') #h2=histogram_ternary_get('apple') print '(b):', h2 # are they the same? print '(c):', h1 is h2 print '(d):', id(h1) print '(e):', id(h2) # print keys h1_keys = h1.keys() h2_keys = h2.keys() print '(f):', h1_keys, id(h1_keys) print '(ff)', h2_keys, id(h2_keys) print '(g):', h1_keys is h2_keys # does 'a' appear as a key? print '(h):', h1_keys.__contains__('a') # print values h1_values = h1.values() h2_values = h2.values() print '(i):', h1_values print '(j):', h1_values is h2_values # does '0' appear as a value? print '(k):', h1_values.__contains__('0') # 'get' takes a key and a default value # If the key appears in the dictionary # 'get' returns the corresponding value; # otherwise it returns the user defined # default value, e.g., 159 in the following example: print '(l):', h1.get('a',159) print '(m):', h1.get('w',159) # print histogram print '(n):--------' print_hist(h1)
Running this in the script mode will give:
$ python histogram.py
(a): {'a': 1, 'p': 2, 'e': 1, 'l': 1}
(b): {'a': 1, 'p': 2, 'e': 1, 'l': 1}
(c): False
(d): 4303246232
(e): 4303246512
(f): ['a', 'p', 'e', 'l']
(g): False
(h): True
(i): [1, 2, 1, 1]
(j): False
(k): False
(l): 1
(m): 159
(n):--------
a 1
p 2
e 1
l 1
Dictionaries and lists¶
Lists can only appear as values in a dictionary, but not keys. For example, if you try:
>>> t=['a','e','l']
>>> type(t)
<type 'list'>
>>> d = dict()
>>> d[t]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> d['1'] = t
>>> d
{'1': ['a','e','l']}
The above example confirms that lists can only be used as values.
Now, let’s consider an application of using lists as values.
Take a look at what we just obtained in the last outcome,
{'1': ['a','e','l']}
. This looks like an inverse map of the output
(a)
or (b)
!
This example tells us that we may implement an inverse map routine
which inverts keys and values in a dictionary. Here is a function
that inversts a dictionary:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 """ /lectureNote/chapters/chapt03/codes/examples/dictionaries/invert_dictionary.py """ def invert_dictionary(d): # create an empty dictionary inverse = dict() # traverse through keys in dictionary "d" for key in d: # "val" is a "value" of "d" associated with a "key" val = d[key] # if val is first found as a key in inverse # create a new val:key pair in inverse if val not in inverse: inverse[val] = [key] # Note above that the values of inverse is assigned as a list, [key] # if val is already found, append the corresponding # key to the list else: #print val, inverse[val] #print type(inverse[val]) inverse[val].append(key) # output inverse dictionary return inverse if __name__ == "__main__": # import histogram method from histogram.py from histogram import histogram as histo # compute histogram hist = histo('apple') print hist # compute inverse map of dictionary inv = invert_dictionary(hist) print inv # what is hist(hist^{-1}('apple'))? h1 = histo(inv) print h1 # what is hist^{-1}(hist(hist^{-1}('apple')))? h2 = invert_dictionary(h1) print h2
The result look like:
$ python invert_dictionary.py
{'a': 1, 'p': 2, 'e': 1, 'l': 1}
{1: ['a', 'e', 'l'], 2: ['p']}
{1: 1, 2: 1}
{1: [1, 2]}
Note
Why are we getting the last two outcomes?
Dictionaries as memos¶
A dictionary can be used for storing quantities that have been already computed, and thereby one doesn’t need to repeat such previously computed operations. In computing, this clearly allows a faster performance by efficiently reusing the stored data.
For example, a straightforward implementation of Fibonacci sequence can look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 """ /lectureNote/chapters/chapt03/codes/examples/dictionaries/fibonacci.py Fibonaci sequence using recursion """ def fibonacci(n): if n == 0: return 0 elif n == 1: return 1 else: res = fibonacci(n-1) + fibonacci(n-2) return res if __name__ == "__main__": fib_numb = fibonacci(12) print fib_numb
Notice now how many times those terms that appear early in the sequence are recursively called repeatedly – very many! A dictionary can be used in this case to keep track of the terms that have been already evaluated and store them in a dictionary:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 """ /lectureNote/chapters/chapt03/codes/examples/dictionaries/fibonacci_dict.py Fibonaci sequence using a dictionary "known" which keeps track of values that have already been computed and stores them for reuse. """ # initialize a dictionary, known, with the first two sequences: F0=0, F1=1 known = {0:0,1:1} def fibonacci_dict(n): # global known # check if n already appears as key in known dictionary if n in known: # if true return the corresponding value return known[n] else: # otherwise, calculate a new Fibonacci number # and add the new Fibonacci number as a new value in the dictionary, known known[n] = fibonacci_dict(n-1) + fibonacci_dict(n-2) return known[n] if __name__ == "__main__": print fibonacci_dict(12) print known
In the above example, known
is a dictionary that stores the Fibonacci numbers
we already know. It starts with the first two terms in the sequence: F0=0
and F1=1
,
or in other words, 0 maps to 0 and 1 maps to 1.
To compare CPU runtimes in seconds, we can do as follow:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 """ /lectureNote/chapters/chapt03/codes/examples/dictionaries/run_fibonacci.py Runtime comparison of the two fibonacci implementations of using recursive and dictionary. """ import time from fibonacci import fibonacci from fibonacci_dict import fibonacci_dict n=30 start_time1 = time.time() fibonacci(n) elapsed_time1 = time.time() - start_time1 start_time2 = time.time() fibonacci_dict(n) elapsed_time2 = time.time() - start_time2 print 'Run time in seconds: Fibonacci & Fibonacci_dict = ', elapsed_time1, elapsed_time2
Global variables¶
In the previous example, known
is initialized outside the function. Therefore,
it belongs to the special frame called __main__
. Variables in __main__
have their scopes globally because they can be accessed from any function.
In order to modify any mutable global variable, especially within a local function, you need to declare it before using it. The following example illustrates how the global variables behaves and how they should be modified in a local function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 """ /lectureNote/chapters/chapt03/codes/examples/dictionaries/global.py """ been_called = False def local_var(): been_called = True print '(a):', been_called local_var() print '(b):', been_called def global_var(): global been_called been_called = True print '(c):', been_called global_var() print '(d):', been_called been_called = False def return_var(): been_called = True return been_called return_var() print '(e):', been_called print '(f):', return_var()
The result looks like:
(a): True
(b): False
(c): True
(d): True
(e): False
(f): True
An example study¶
Consider the following example which has been originally adopted from Dive Into Python 3 and modified for the class.
This routine takes a computer file size in kilobytes as an input and converts it approximately to a human-readable form, e.g., 1TB, or 931 GiB, etc.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 """ /lectureNote/chapters/chapt03/codes/examples/dictionaries/humansize.py NOTE: This routine has been extracted from http://www.diveintopython3.net/your-first-python-program.html and modified by Prof. Dongwook Lee for AMS 209. """ SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'], 1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']} def approximate_size(size, a_kilobyte_is_1024_bytes=True): '''Convert a file size to human-readable form. Keyword arguments: size -- file size in bytes a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024 if False, use multiples of 1000 Returns: file size in a string format ''' if size < 0: print 'number must be non-negative' if a_kilobyte_is_1024_bytes: multiple = 1024 else: multiple = 1000 # Initialize an empty size_dict array to keep track of # the file sizes and suffixes. # The result is going to be the last key:value pair when # a computed size becomes smaller than the file size unit (i.e., multiple). size_dict=dict() for suffix in SUFFIXES[multiple]: #print suffix size /= multiple # <==> size = size/multiple #print size size_dict[size]=suffix # Keep dividing until a size is less than the chosen file size unit if size < multiple: return str(size) + ' ' + size_dict[size] print 'number too large' if __name__ == '__main__': print '(a) with the multiple of 1000 bytes: ', approximate_size(1000000000000, False) print '(b) with the multiple of 1024 bytes: ', approximate_size(1000000000000)
The output from running the routine looks like:
$ python humansize.py
(a) with the multiple of 1000 bytes: 1 TB
(b) with the multiple of 1024 bytes: 931 GiB