Hash Tables
This chapter covers hash tables. You will learn the following:
What is a hash table and why should you use it?
What is hashing and a hash function?
What are the C++ standard library implementations for hash tables?
Introduction
A hash table is a data structure that stores elements in key-value pairs. Key is a unique value used to compute a table index and value is the data associated with the key. The benefit of using a hash table is its fast access time. Typically, the time complexity is a constant (O(1) access time).
Hashing Function
As stated previously, in a hash table, a new index is processed using the key and the corresponding value for this key is stored in the index. This process is called hashing and the function performing this operation is called a hashing function. For example, consider a hashing function as operation mod 10 and a set of key-value pairs to insert into hash table {{15, 25}, {23, 63}, {12, 22}, {48, 78}, {30, 0}}.
key |
index based on hashing function |
value |
---|---|---|
15 |
15 % 10 = 5 |
25 |
23 |
23 % 10 = 3 |
63 |
12 |
12 % 10 = 2 |
22 |
48 |
48 % 10 = 8 |
78 |
30 |
30 % 10 = 0 |
0 |
This means that the positions in the hash table will be the following:
position |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
value |
0 |
22 |
63 |
25 |
78 |
As you can see, there's a good chance that more than one key value will compute the same index. To solve this problem, in a standard library, values are stored in buckets — so multiple values can be under the same index.
C++ Standard Library Hash Tables
In the C++ standard library, four different hash tables are available. They are the equivalents of standard containers but with a hashing function:
std::unordered_set (equivalent of std::set)
std::unordered_multiset (equivalent of std::multiset)
std::unordered_map (equivalent of std::map)
std::unordered_multimap (equivalent of std::multimap)
Let's look at them one by one.
Set
Let's start with std::unordered_set. According to the C++ Reference:
std::unordered_set is an associative container holding a set of unique objects of type Key.
Search, insertion, and removal have average constant time complexity.
The most important aspect of std::unordered_set is that it contains unique objects, which are of type key, and has average access time constant. An example of a std::unordered_set declaration is shown in the code below:
std::unordered_set<int> n;
To initialize std::unordered_set, we can simply assign values to it at declaration. Remember that the given values have to be unique.
std::unordered_set<int> n {0, 1, 2, 3, 4};
As in other standard containers, we can conduct operations like the following:
size(),
insert(...),
erase(...),
count(...),
find(...),
iteration over elements.
Similar to std::unordered_set is std::unordered_multiset. Let's start with a definition:
std::unordered_multiset is an associative container holding a set of possibly non-unique
objects of type Key. Search, insertion, and removal have average constant time complexity.
This means that the only difference between std::unordered_set and std::unordered_multiset is that the latter allows multiple keys to be stored. An example of the std::unordered_multiset declaration is shown in code below:
std::unordered_multiset<int> n;
To initialize std::unordered_multiset, we can simply assign values to it at declaration. This time the values may be repeated.
std::unordered_multiset<int> n {0, 1, 2, 1, 2};
Map
Now, we will move to the map containers, starting with std::unordered_map. According to the C++ Reference:
std:unordered_map is an associative container holding key-value pairs with unique keys.
Search, insertion, and removal of elements have average constant time complexity.
This means that the most important information about std::unordered_map is that it stores key-value pairs, where key is unique and the average access time is constant. The code below shows an example of a std::unordered_map declaration where key is of type int and value is of type std::string:
std::unordered_map<int, std::string> m;
To initialize std::unordered_map, we can simply assign values to it at declaration. Remember that the key values have to be unique.
std::unordered_map<int, std::string> m {{0, "zero"},
{1, "one"},
{2, "two"}};
Similarly, as with a set container, std::unordered_multimap and std::unordered_map have a lot in common. Let's look at the C++ Reference definition:
std::unordered_multimap is an unordered associative container that supports equivalent keys
(an unordered_multimap may contain multiple copies of each key value) and that associates values
of another type with the keys. (...) Search, insertion, and removal
have average constant time complexity.
The only difference is that std::unordered_multimap allows for keys to be repeated. The code below shows an example of a std::unordered_multimap declaration where key is of type int and value is of type std::string:
std::unordered_multimap<int, std::string> m;
To initialize std::unordered_multimap, as before, we can assign values to it at declaration. This time the keys don't need to be unique.
std::unordered_multimap<int, std::string> m {{0, "zero"},
{1, "one"},
{2, "two"},
{0, "zero"}};
And of course, it supports several operations like other standard library containers.
Summary
To summarize this module, we would like to compare all of the standard library associative containers.
Container |
Sorted |
Value |
Identical keys possible |
Average access time |
---|---|---|---|---|
std::set |
yes |
no |
no |
logarithmic |
std::unordered_set |
no |
no |
no |
constant |
std::map |
yes |
yes |
no |
logarithmic |
std::unordered_map |
no |
yes |
no |
constant |
std::multiset |
yes |
no |
yes |
logarithmic |
std::unordered_multiset |
no |
no |
yes |
constant |
std::multimap |
yes |
yes |
yes |
logarithmic |
std::unordered_multimap |
no |
yes |
yes |
constant |