Dictionaries¶
Another useful data type built into Python is the dictionary. A dictionary is like a list, but it is more general. In a list, the indices have to be integers; in a dictionary they can be any type.
Key:value pair¶
- You can think of a dictionary as a mapping between two things:
- keys: a set of indices,
- values: a set of values corresponding to each key.
Each key maps to a value. The association of a key and a value pair is called a key:value pair.
You can define an empty dictionary in two ways.
One way is to use a built-in function dict
>>> eng2kor = dict()
or altenatively, use an empty squiggly-brackets, {}
>>> eng2kor = {}
In both cases you see the following
>>> type(eng2kor)
<type 'dict'>
>>> print eng2kor
{}
Let’s add a new pair to the dictionary, eng2kor
. To add one, you can
use square brackets
>>> eng2kor['one'] = 'hana'
This creates a new key:value pair that maps from the key one
to the value
hana
. If we print the dictionary again
>>> print(eng2kor) # or simply >>> eng2kor
{'one': 'hana'}
One can add multiple pairs using this output format
>>> eng2kor={'one':'hana','two':'dool','three':'set','four':'net'}
Unfortunately, append
method cannot be invoked on a dictionary directly
(i.e., eng2kor.append('five')
won’t work).
Instead, one can keep adding new pairs using
>>> eng2kor['five'] = 'dasut'
or using update
(try help(dict)
or dir(dict)
to see more options)
>>> eng2kor.update({'five':'dasut'})
Let’s now print to see what we have defined so far
>>> print eng2kor
{'four': 'net', 'three': 'set', 'five': 'dasut', 'two': 'dool', 'one': 'hana'}
The order of the key:value pairs does not look like what you might have expected. In fact, they might look different on different computers. Surprisingly, the order of pairs in a dictionary is unpredictable. This is because the elements of a dictionary are never indexed with integer indices (they are still iterable though). Even though it might look confusing, this is not a problem as long as the one-to-one correspondance between the key:value relationships remain unchanged, which is the case all the time
>>> eng2kor['five']
'dasut'
>>> eng2kor['two']
'dool'
or traversing through the dictionary will show
>>> for i in eng2kor:
... print i
...
four
three
five
two
one
Here we see that traversing a dictionary is executed among the key lists, not the value lists.
We can use the iterators that are defined as methods in dictionary
(try help(dict)
and find these), items()
, keys()
, and values()
>>> for eng, kor in eng2kor.items():
... print eng, kor
...
one hana
two dool
three set
four net
five dasut
or, to just get keys
>>> for eng in eng2kor.keys():
... print eng
...
one
two
three
four
five
Similarly, to just get values
>>> for kor in eng2kor.values():
... print kor
...
hana
dool
set
net
dasut
The method items()
defined in dictionary changes the dictionary to
a list with (key,value)
pairs as tuples
>>> eng2kor.items()
dict_items([('one', 'hana'), ('two', 'dool'), ('three', 'set'), ('four', 'net')])
We can apply some of the methods we learned so far to a dictionary
>>> len(eng2kor)
5
>>> 'one' in eng2kor
True
>>> 'net' in eng2kor
False
The second example of the in
operator tells us that Python checks if
the search word appears as a key
, but not as a value
in the dictionary.
To see whether something appears as a value instead of a key, is to use the method
values
which returns the values as a list
>>> print(eng2kor.values())
dict_values(['hana', 'dool', 'set', 'net'])
With this we can now search
>>> vals = eng2kor.values()
>>> 'net' in vals
True
We can also compare between keys or between values
>>> eng2kor.keys()
dict_keys(['one', 'two', 'three', 'four'])
Dictionary as a set of counters¶
Consider that you are given a string and you wish to count how many times each character appears. Recall that we did this before using a list String methods. This time let’s use a dictionary and see how we can implement a more general algorithm:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | """
/lectureNote/chapters/chapt03/codes/examples/dictionaries/histogram.py
"""
def histogram(s):
# initialize with an empty dictionary
d = dict()
for c in s:
if c not in d:
# if c first appears as a key in d
# then initialize its value to one.
d[c] = 1
else:
# if c appears as a key more than once
# add its value by one.
d[c] += 1
# return dictionary
return d
def histogram_ternary(s):
# This is exactly the same as histogram
# but using a so-called 'ternary operator':
# a if test else b
#
# Ex: x='apple' if a > 2 else 'orange'
# Translating this into English will be
# x is 'apple' if a > 2; otherwise x is 'orange'
d = dict()
for c in s:
# the ternary expression is much shorter
# than the conventional if-else statement
# with a reduced readability.
d[c] = 1 if c not in d else d[c]+1
return d
def histogram_ternary_get(s):
# This is exactly the same as histogram
# but using 'get' method defined in dictionary:
# See help(dict) and check out:
#
# get(...)
# D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.
# i.e., if D is a dictionary,
# /D[k] if k in D
# D.get(k,d) = |
# \ d if k not in D
#
# Ex: x='apple' if a > 2 else 'orange'
# Translating this into English will be
# x is 'apple' if a > 2; otherwise x is 'orange'
d = dict()
for c in s:
# the ternary expression is much shorter
# than the conventional if-else statement
# with a reduced readability.
#d[c] = 1 if c not in d else d[c]+1
d[c] = d.get(c,0) + 1
return d
def print_hist(h):
for c in h:
# print key and value
print( c,h[c] )
if __name__ == "__main__":
# first function
h1=histogram('apple')
print( '(a):', h1 )
# second function which uses the ternary operator
h2=histogram_ternary('apple')
#h2=histogram_ternary_get('apple')
print( '(b):', h2 )
# are they the same?
print( '(c):', h1 is h2 )
print( '(d):', id(h1) )
print ( '(e):', id(h2) )
# print keys
h1_keys = h1.keys()
h2_keys = h2.keys()
print( '(f):', h1_keys, id(h1_keys) )
print( '(ff)', h2_keys, id(h2_keys) )
print( '(g):', h1_keys is h2_keys )
# does 'a' appear as a key?
print( '(h):', list(h1_keys).__contains__('a') )
# print values
h1_values = h1.values()
h2_values = h2.values()
print( '(i):', h1_values )
print( '(j):', h1_values is h2_values )
# does '0' appear as a value?
print( '(k):', list(h1_values).__contains__('0') )
# 'get' takes a key and a default value
# If the key appears in the dictionary
# 'get' returns the corresponding value;
# otherwise it returns the user defined
# default value, e.g., 159 in the following example:
print( '(l):', h1.get('a',159) )
print( '(m):', h1.get('w',159) )
# print histogram
print( '(n):--------' )
print_hist(h1)
|
Running this in the script mode will give
$ python3 histogram.py
(a): {'a': 1, 'p': 2, 'l': 1, 'e': 1}
(b): {'a': 1, 'p': 2, 'l': 1, 'e': 1}
(c): False
(d): 4402933120
(e): 4402933336
(f): dict_keys(['a', 'p', 'l', 'e']) 4398279176
(ff) dict_keys(['a', 'p', 'l', 'e']) 4398278408
(g): False
(h): True
(i): dict_values([1, 2, 1, 1])
(j): False
(k): False
(l): 1
(m): 159
(n):--------
a 1
p 2
l 1
e 1
Dictionaries and lists¶
Lists can only appear as values in a dictionary, but not keys. For example, if you try
>>> t=['a','e','l']
>>> type(t)
<class 'list'>
>>> d = dict()
>>> d[t]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> d['1'] = t
>>> d
{'1': ['a', 'e', 'l']}
The above example confirms that lists can only be used as values.
Now, let’s consider an application of using lists as values.
Take a look at what we just obtained in the last outcome,
{'1': ['a','e','l']}
. This looks like an inverse map of the output
(a)
or (b)
!
This example tells us that we may implement an inverse map routine
which inverts keys and values in a dictionary. Here is a function
that inversts a dictionary:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | """
/lectureNote/chapters/chapt03/codes/examples/dictionaries/invert_dictionary.py
"""
def invert_dictionary(d):
# create an empty dictionary
inverse = dict()
# traverse through keys in dictionary "d"
for key in d:
# "val" is a "value" of "d" associated with a "key"
val = d[key]
# if val is first found as a key in inverse
# create a new val:key pair in inverse
if val not in inverse:
inverse[val] = [key]
# Note above that the values of inverse is assigned as a list, [key]
# if val is already found, append the corresponding
# key to the list
else:
#print val, inverse[val]
#print type(inverse[val])
inverse[val].append(key)
# output inverse dictionary
return inverse
if __name__ == "__main__":
# import histogram method from histogram.py
from histogram import histogram as histo
# compute histogram
hist = histo('apple')
print( hist )
# compute inverse map of dictionary
inv = invert_dictionary(hist)
print( inv )
# what is hist(hist^{-1}('apple'))?
h1 = histo(inv)
print( h1 )
# what is hist^{-1}(hist(hist^{-1}('apple')))?
h2 = invert_dictionary(h1)
print( h2 )
|
The result look like
$ python3 invert_dictionary.py
{'a': 1, 'p': 2, 'e': 1, 'l': 1}
{1: ['a', 'e', 'l'], 2: ['p']}
{1: 1, 2: 1}
{1: [1, 2]}
Note
Why are we getting the last two outcomes?
Dictionaries as memos¶
A dictionary can be used for storing quantities that have been already computed, and thereby one doesn’t need to repeat such previously computed operations. In computing, this clearly allows a faster performance by efficiently reusing the stored data.
For example, a straightforward implementation of Fibonacci sequence can look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | """
/lectureNote/chapters/chapt03/codes/examples/dictionaries/fibonacci.py
Fibonaci sequence using recursion
"""
def fibonacci(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
res = fibonacci(n-1) + fibonacci(n-2)
return res
if __name__ == "__main__":
fib_numb = fibonacci(12)
print( fib_numb )
|
Notice now how many times those terms that appear early in the sequence are recursively called repeatedly – very many! A dictionary can be used in this case to keep track of the terms that have been already evaluated and store them in a dictionary:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | """
/lectureNote/chapters/chapt03/codes/examples/dictionaries/fibonacci_dict.py
Fibonaci sequence using a dictionary "known" which keeps track of values that
have already been computed and stores them for reuse.
"""
# initialize a dictionary, known, with the first two sequences: F0=0, F1=1
known = {0:0,1:1}
def fibonacci_dict(n):
# global known
# check if n already appears as key in known dictionary
if n in known:
# if true return the corresponding value
return known[n]
else:
# otherwise, calculate a new Fibonacci number
# and add the new Fibonacci number as a new value in the dictionary, known
known[n] = fibonacci_dict(n-1) + fibonacci_dict(n-2)
return known[n]
if __name__ == "__main__":
print( fibonacci_dict(12) )
print( known )
|
In the above example, known
is a dictionary that stores the Fibonacci numbers
we already know. It starts with the first two terms in the sequence: F0=0
and F1=1
,
or in other words, 0 maps to 0 and 1 maps to 1.
To compare CPU runtimes in seconds, we can do as follow:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | """
/lectureNote/chapters/chapt03/codes/examples/dictionaries/run_fibonacci.py
Runtime comparison of the two fibonacci implementations of using recursive and dictionary.
"""
import time
from fibonacci import fibonacci
from fibonacci_dict import fibonacci_dict
n=30
start_time1 = time.time()
fibonacci(n)
elapsed_time1 = time.time() - start_time1
start_time2 = time.time()
fibonacci_dict(n)
elapsed_time2 = time.time() - start_time2
print( 'Run time in seconds: Fibonacci & Fibonacci_dict = ', elapsed_time1, elapsed_time2 )
|
Global variables¶
In the previous example, known
is initialized outside the function. Therefore,
it belongs to the special frame called __main__
. Variables in __main__
have their scopes globally because they can be accessed from any function.
In order to modify any mutable global variable, especially within a local function, you need to declare it before using it. The following example illustrates how the global variables behaves and how they should be modified in a local function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | """
/lectureNote/chapters/chapt03/codes/examples/dictionaries/global.py
"""
been_called = False
def local_var():
been_called = True
print( '(a):', been_called )
local_var()
print( '(b):', been_called )
def global_var():
global been_called
been_called = True
print( '(c):', been_called )
global_var()
print( '(d):', been_called )
been_called = False
def return_var():
been_called = True
return been_called
return_var()
print( '(e):', been_called )
print( '(f):', return_var() )
|
The result looks like
(a): True
(b): False
(c): True
(d): True
(e): False
(f): True
An example study¶
Consider the following example which has been originally adopted from Dive Into Python 3 and modified for the class.
This routine takes a computer file size in kilobytes as an input and converts it approximately to a human-readable form, e.g., 1TB, or 931 GiB, etc.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | """
/lectureNote/chapters/chapt03/codes/examples/dictionaries/humansize.py
NOTE: This routine has been extracted from
http://www.diveintopython3.net/your-first-python-program.html
and modified by Prof. Dongwook Lee for AMS 209
and modified by Youngjun Lee for AMS 129
"""
SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
'''Convert a file size to human-readable form.
Keyword arguments:
size -- file size in bytes
a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024
if False, use multiples of 1000
Returns: file size in a string format
'''
if size < 0:
print( 'number must be non-negative' )
if a_kilobyte_is_1024_bytes:
multiple = 1024
else:
multiple = 1000
# Initialize an empty size_dict array to keep track of
# the file sizes and suffixes.
# The result is going to be the last key:value pair when
# a computed size becomes smaller than the file size unit (i.e., multiple).
size_dict=dict()
for suffix in SUFFIXES[multiple]:
#print suffix
size /= multiple # <==> size = size/multiple
#print size
size_dict[size]=suffix
# Keep dividing until a size is less than the chosen file size unit
if size < multiple:
return str(size) + ' ' + size_dict[size]
print( 'number too large' )
if __name__ == '__main__':
print( '(a) with the multiple of 1000 bytes: ', approximate_size(1000000000000, False) )
print( '(b) with the multiple of 1024 bytes: ', approximate_size(1000000000000) )
|
The output from running the routine looks like
$ python3 humansize.py
(a) with the multiple of 1000 bytes: 1 TB
(b) with the multiple of 1024 bytes: 931 GiB