# Dictionaries¶

Another useful data type built into Python is the dictionary. A dictionary is like a list, but it is more general. In a list, the indices have to be integers; in a dictionary they can be any type.

## Key:value pair¶

- You can think of a dictionary as a mapping between two things:
**keys**: a set of indices,**values**: a set of values corresponding to each key.

Each key maps to a value. The association of a key and a value pair
is called a **key:value pair**.

You can define an empty dictionary in two ways.
One way is to use a built-in function `dict`

:

```
>>> eng2kor = dict()
```

or altenatively, use an empty squiggly-brackets, `{}`

:

```
>>> eng2kor = {}
```

In both cases you see the following:

```
>>> type(eng2kor)
<type 'dict'>
>>> print eng2kor
{}
```

Let’s add a new pair to the dictionary, `eng2kor`

. To add one, you can
use square brackets:

```
>>> eng2kor['one'] = 'hana'
```

This creates a new key:value pair that maps from the key `one`

to the value
`hana`

. If we print the dictionary again:

```
>>> print eng2kor # or simply >>> eng2kor
{'one': 'hana'}
```

One can add multiple pairs using this output format:

```
>>> eng2kor={'one':'hana','two':'dool','three':'set','four':'net'}
```

Unfortunately, `append`

method cannot be invoked on a dictionary *directly*
(i.e., `eng2kor.append('five')`

won’t work).
Instead, one can keep adding new pairs using:

```
>>> eng2kor['five'] = 'dasut'
```

or using `update`

(try `help(dict)`

or `dir(dict)`

to see more options):

```
>>> eng2kor.update({'five':'dasut'})
```

Let’s now print to see what we have defined so far:

```
>>> print eng2kor
{'four': 'net', 'three': 'set', 'five': 'dasut', 'two': 'dool', 'one': 'hana'}
```

The order of the key:value pairs does **not** look like what you might have expected.
In fact, they might look different on different computers. Surprisingly, the order of
pairs in a dictionary is unpredictable. This is because the elements
of a dictionary are never indexed with integer indices (they are still iterable though).
Even though it might look confusing, this is not a problem as long as the one-to-one
correspondance between the key:value relationships remain unchanged,
which is the case all the time:

```
>>> eng2kor['five']
'dasut'
>>> eng2kor['two']
'dool'
```

or traversing through the dictionary will show:

```
>>> for i in eng2kor:
... print i
...
four
three
five
two
one
```

Here we see that traversing a dictionary is executed among the key lists, not the value lists.

We can use the iterators that are defined as methods in dictionary
(try `help(dict)`

and find these), `iteritems()`

, `iterkeys()`

, and `itervalues()`

:

```
>>> for eng, kor in eng2kor.iteritems():
... print eng, kor
...
four net
one hana
five dasut
three set
two dool
```

or, to just get keys:

```
>>> for eng in eng2kor.iterkeys():
... print eng
...
four
one
five
three
two
```

Similarly, to just get values:

```
>>> for kor in eng2kor.itervalues():
... print kor
...
net
hana
dasut
set
dool
```

The method `items()`

defined in dictionary changes the dictionary to
a list with `(key,value)`

pairs as tuples:

```
>>> eng2kor.items()
[('four', 'net'), ('one', 'hana'), ('five', 'dasut'), ('three', 'set'), ('two', 'dool')]
```

We can apply some of the methods we learned so far to a dictionary:

```
>>> len(eng2kor)
5
>>> 'one' in eng2kor
True
>>> 'net' in eng2kor
False
```

The second example of the `in`

operator tells us that Python checks if
the search word appears as a `key`

, but not as a `value`

in the dictionary.

To see whether something appears as a value instead of a key, is to use the method
`values`

which returns the values as a list:

```
>>> print eng2kor.values()
['net', 'set', 'dasut', 'dool', 'hana']
```

With this we can now search:

```
>>> vals = eng2kor.values()
>>> 'net' in vals
True
```

We can also compare between keys or between values:

```
>>> eng2kor.keys()
['four', 'three', 'five', 'two', 'one']
>>> eng2kor.keys()[0].__gt__(eng2kor.keys()[2]) # this is equivalent to eng2kor.keys()[0] > eng2kor.keys()[2]
True
```

Comparing the corresponding values (i.e., the two corresponding value elements to the 0th and 2nd key elements):

```
>>> eng2kor.values()[0] > eng2kor.values()[2]
True
```

## Dictionary as a set of counters¶

Consider that you are given a string and you wish to count how many times each character appears. Recall that we did this before using a list Searching and counting. This time let’s use a dictionary and see how we can implement a more general algorithm:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 """ /lectureNote/chapters/chapt03/codes/examples/dictionaries/histogram.py """ def histogram(s): # initialize with an empty dictionary d = dict() for c in s: if c not in d: # if c first appears as a key in d # then initialize its value to one. d[c] = 1 else: # if c appears as a key more than once # add its value by one. d[c] += 1 # return dictionary return d def histogram_ternary(s): # This is exactly the same as histogram # but using a so-called 'ternary operator': # a if test else b # # Ex: x='apple' if a > 2 else 'orange' # Translating this into English will be # x is 'apple' if a > 2; otherwise x is 'orange' d = dict() for c in s: # the ternary expression is much shorter # than the conventional if-else statement # with a reduced readability. d[c] = 1 if c not in d else d[c]+1 return d def histogram_ternary_get(s): # This is exactly the same as histogram # but using 'get' method defined in dictionary: # See help(dict) and check out: # # get(...) # D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None. # i.e., if D is a dictionary, # /D[k] if k in D # D.get(k,d) = | # \ d if k not in D # # Ex: x='apple' if a > 2 else 'orange' # Translating this into English will be # x is 'apple' if a > 2; otherwise x is 'orange' d = dict() for c in s: # the ternary expression is much shorter # than the conventional if-else statement # with a reduced readability. #d[c] = 1 if c not in d else d[c]+1 d[c] = d.get(c,0) + 1 return d def print_hist(h): for c in h: # print key and value print c,h[c] if __name__ == "__main__": # first function h1=histogram('apple') print '(a):', h1 # second function which uses the ternary operator h2=histogram_ternary('apple') #h2=histogram_ternary_get('apple') print '(b):', h2 # are they the same? print '(c):', h1 is h2 print '(d):', id(h1) print '(e):', id(h2) # print keys h1_keys = h1.keys() h2_keys = h2.keys() print '(f):', h1_keys, id(h1_keys) print '(ff)', h2_keys, id(h2_keys) print '(g):', h1_keys is h2_keys # does 'a' appear as a key? print '(h):', h1_keys.__contains__('a') # print values h1_values = h1.values() h2_values = h2.values() print '(i):', h1_values print '(j):', h1_values is h2_values # does '0' appear as a value? print '(k):', h1_values.__contains__('0') # 'get' takes a key and a default value # If the key appears in the dictionary # 'get' returns the corresponding value; # otherwise it returns the user defined # default value, e.g., 159 in the following example: print '(l):', h1.get('a',159) print '(m):', h1.get('w',159) # print histogram print '(n):--------' print_hist(h1)

Running this in the script mode will give:

```
$ python histogram.py
(a): {'a': 1, 'p': 2, 'e': 1, 'l': 1}
(b): {'a': 1, 'p': 2, 'e': 1, 'l': 1}
(c): False
(d): 4303246232
(e): 4303246512
(f): ['a', 'p', 'e', 'l']
(g): False
(h): True
(i): [1, 2, 1, 1]
(j): False
(k): False
(l): 1
(m): 159
(n):--------
a 1
p 2
e 1
l 1
```

## Dictionaries and lists¶

Lists can only appear as values in a dictionary, but **not** keys.
For example, if you try:

```
>>> t=['a','e','l']
>>> type(t)
<type 'list'>
>>> d = dict()
>>> d[t]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> d['1'] = t
>>> d
{'1': ['a','e','l']}
```

The above example confirms that lists can only be used as values.
Now, let’s consider an application of using lists as values.
Take a look at what we just obtained in the last outcome,
`{'1': ['a','e','l']}`

. This looks like an inverse map of the output
`(a)`

or `(b)`

!
This example tells us that we may implement an inverse map routine
which inverts keys and values in a dictionary. Here is a function
that inversts a dictionary:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 """ /lectureNote/chapters/chapt03/codes/examples/dictionaries/invert_dictionary.py """ def invert_dictionary(d): # create an empty dictionary inverse = dict() # traverse through keys in dictionary "d" for key in d: # "val" is a "value" of "d" associated with a "key" val = d[key] # if val is first found as a key in inverse # create a new val:key pair in inverse if val not in inverse: inverse[val] = [key] # Note above that the values of inverse is assigned as a list, [key] # if val is already found, append the corresponding # key to the list else: #print val, inverse[val] #print type(inverse[val]) inverse[val].append(key) # output inverse dictionary return inverse if __name__ == "__main__": # import histogram method from histogram.py from histogram import histogram as histo # compute histogram hist = histo('apple') print hist # compute inverse map of dictionary inv = invert_dictionary(hist) print inv # what is hist(hist^{-1}('apple'))? h1 = histo(inv) print h1 # what is hist^{-1}(hist(hist^{-1}('apple')))? h2 = invert_dictionary(h1) print h2

The result look like:

```
$ python invert_dictionary.py
{'a': 1, 'p': 2, 'e': 1, 'l': 1}
{1: ['a', 'e', 'l'], 2: ['p']}
{1: 1, 2: 1}
{1: [1, 2]}
```

Note

Why are we getting the last two outcomes?

## Dictionaries as memos¶

A dictionary can be used for storing quantities that have been already computed, and thereby one doesn’t need to repeat such previously computed operations. In computing, this clearly allows a faster performance by efficiently reusing the stored data.

For example, a straightforward implementation of Fibonacci sequence can look like:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 """ /lectureNote/chapters/chapt03/codes/examples/dictionaries/fibonacci.py Fibonaci sequence using recursion """ def fibonacci(n): if n == 0: return 0 elif n == 1: return 1 else: res = fibonacci(n-1) + fibonacci(n-2) return res if __name__ == "__main__": fib_numb = fibonacci(12) print fib_numb

Notice now how many times those terms that appear early in the sequence are recursively called repeatedly – very many! A dictionary can be used in this case to keep track of the terms that have been already evaluated and store them in a dictionary:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 """ /lectureNote/chapters/chapt03/codes/examples/dictionaries/fibonacci_dict.py Fibonaci sequence using a dictionary "known" which keeps track of values that have already been computed and stores them for reuse. """ # initialize a dictionary, known, with the first two sequences: F0=0, F1=1 known = {0:0,1:1} def fibonacci_dict(n): # global known # check if n already appears as key in known dictionary if n in known: # if true return the corresponding value return known[n] else: # otherwise, calculate a new Fibonacci number # and add the new Fibonacci number as a new value in the dictionary, known known[n] = fibonacci_dict(n-1) + fibonacci_dict(n-2) return known[n] if __name__ == "__main__": print fibonacci_dict(12) print known

In the above example, `known`

is a dictionary that stores the Fibonacci numbers
we already know. It starts with the first two terms in the sequence: `F0=0`

and `F1=1`

,
or in other words, 0 maps to 0 and 1 maps to 1.

To compare CPU runtimes in seconds, we can do as follow:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 """ /lectureNote/chapters/chapt03/codes/examples/dictionaries/run_fibonacci.py Runtime comparison of the two fibonacci implementations of using recursive and dictionary. """ import time from fibonacci import fibonacci from fibonacci_dict import fibonacci_dict n=30 start_time1 = time.time() fibonacci(n) elapsed_time1 = time.time() - start_time1 start_time2 = time.time() fibonacci_dict(n) elapsed_time2 = time.time() - start_time2 print 'Run time in seconds: Fibonacci & Fibonacci_dict = ', elapsed_time1, elapsed_time2

## Global variables¶

In the previous example, `known`

is initialized *outside* the function. Therefore,
it belongs to the special frame called `__main__`

. Variables in `__main__`

have their scopes *globally* because they can be accessed from any function.

In order to modify any mutable global variable, especially within a local function,
you need to **declare** it before using it.
The following example illustrates how the global variables behaves and how they
should be modified in a local function:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 """ /lectureNote/chapters/chapt03/codes/examples/dictionaries/global.py """ been_called = False def local_var(): been_called = True print '(a):', been_called local_var() print '(b):', been_called def global_var(): global been_called been_called = True print '(c):', been_called global_var() print '(d):', been_called been_called = False def return_var(): been_called = True return been_called return_var() print '(e):', been_called print '(f):', return_var()

The result looks like:

```
(a): True
(b): False
(c): True
(d): True
(e): False
(f): True
```

## An example study¶

Consider the following example which has been originally adopted from Dive Into Python 3 and modified for the class.

This routine takes a computer file size in kilobytes as an input and converts it approximately to a human-readable form, e.g., 1TB, or 931 GiB, etc.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 """ /lectureNote/chapters/chapt03/codes/examples/dictionaries/humansize.py NOTE: This routine has been extracted from http://www.diveintopython3.net/your-first-python-program.html and modified by Prof. Dongwook Lee for AMS 209. """ SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'], 1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']} def approximate_size(size, a_kilobyte_is_1024_bytes=True): '''Convert a file size to human-readable form. Keyword arguments: size -- file size in bytes a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024 if False, use multiples of 1000 Returns: file size in a string format ''' if size < 0: print 'number must be non-negative' if a_kilobyte_is_1024_bytes: multiple = 1024 else: multiple = 1000 # Initialize an empty size_dict array to keep track of # the file sizes and suffixes. # The result is going to be the last key:value pair when # a computed size becomes smaller than the file size unit (i.e., multiple). size_dict=dict() for suffix in SUFFIXES[multiple]: #print suffix size /= multiple # <==> size = size/multiple #print size size_dict[size]=suffix # Keep dividing until a size is less than the chosen file size unit if size < multiple: return str(size) + ' ' + size_dict[size] print 'number too large' if __name__ == '__main__': print '(a) with the multiple of 1000 bytes: ', approximate_size(1000000000000, False) print '(b) with the multiple of 1024 bytes: ', approximate_size(1000000000000)

The output from running the routine looks like:

```
$ python humansize.py
(a) with the multiple of 1000 bytes: 1 TB
(b) with the multiple of 1024 bytes: 931 GiB
```