HashTable

A hash table is a data structure that implements an associative array, which maps keys to values. It uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.

Hash Function

A hash function is a function that takes an input (or “key”) and returns a fixed-size string of bytes. The output is typically a hash code that represents the input data. A good hash function should have the following properties:

  • Deterministic: The same input should always produce the same output.
  • Fast: The hash function should be able to compute the hash code quickly.
  • Uniform: The hash function should distribute the keys uniformly across the hash table to minimize collisions.
  • Minimize Collisions: A collision occurs when two different keys produce the same hash code. A good hash function should minimize the number of collisions.

Collision Resolution

Since hash tables can have collisions, they need a way to resolve them. There are several methods for collision resolution, including:

  • Separate Chaining: In this method, each bucket in the hash table contains a linked list of key-value pairs. When a collision occurs, the new key-value pair is added to the linked list at the corresponding bucket.
  • Open Addressing: In this method, when a collision occurs, the hash table searches for the next available slot in the array to store the key-value pair. This can be done using linear probing, quadratic probing, or double hashing.
  • Cuckoo Hashing: This method uses two or more hash functions to resolve collisions. When a collision occurs, the existing key-value pair is “kicked out” and rehashed using a different hash function.
  • Robin Hood Hashing: This method is a variation of open addressing where the key-value pair that is “kicked out” is the one that has been in the hash table the longest, rather than the one that caused the collision.

Load Factor

The load factor of a hash table is defined as the number of entries divided by the number of buckets. It is a measure of how full the hash table is. A high load factor can lead to more collisions and decreased performance, while a low load factor can lead to wasted space. To maintain efficient performance, it is common to resize the hash table when the load factor exceeds a certain threshold (e.g., 0.75).

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
class Node:
def __init__(self, key, value):
self.key = key
self.value = value
self.next = None

class HashTable:
def __init__(self, capacity=8):
self.__capacity = capacity
self.__size = 0
self.__load_factor_threshold = 0.75
self.__table = [None] * self.__capacity

def display(self):
for i, node in enumerate(self.__table):
print(f"Bucket {i}: ", end="")
current = node
while current:
print(f"({current.key}: {current.value}) -> ", end="")
current = current.next
print("None")
print(f"Size: {self.__size}, Capacity: {self.__capacity}, Load Factor: {self.__size / self.__capacity:.2f}")

def __hash(self, key):
return hash(key) % self.__capacity

def __grow(self):
old_table = self.__table
self.__capacity *= 2
self.__table = [None] * self.__capacity
self.__size = 0

for node in old_table:
current = node
while current:
self.put(current.key, current.value)
current = current.next

@property
def size(self):
return self.__size

def is_empty(self):
return self.__size == 0

def put(self, key, value):
if self.__size / self.__capacity >= self.__load_factor_threshold:
self.__grow()

index = self.__hash(key)
new_node = Node(key, value)
if self.__table[index] is None:
self.__table[index] = new_node
else:
current = self.__table[index]
while current and current.next:
if current.key == key:
current.value = value
return
current = current.next
current.next = new_node
self.__size += 1

def remove(self, key):
index = self.__hash(key)
current = self.__table[index]
prev = None
while current:
if current.key == key:
if prev:
prev.next = current.next
else:
self.__table[index] = current.next
self.__size -= 1
return True
prev = current
current = current.next
return False

def get(self, key):
index = self.__hash(key)
current = self.__table[index]
while current:
if current.key == key:
return current.value
current = current.next
return None

def for_each(self, func):
for node in self.__table:
current = node
while current:
func(current.key, current.value)
current = current.next