HashTable

A hash table is a data structure that implements an associative array, which maps keys to values. It uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.

Hash Function

A hash function is a function that takes an input (or “key”) and returns a fixed-size string of bytes. The output is typically a hash code that represents the input data. A good hash function should have the following properties:

Deterministic: The same input should always produce the same output.
Fast: The hash function should be able to compute the hash code quickly.
Uniform: The hash function should distribute the keys uniformly across the hash table to minimize collisions.
Minimize Collisions: A collision occurs when two different keys produce the same hash code. A good hash function should minimize the number of collisions.

Collision Resolution

Since hash tables can have collisions, they need a way to resolve them. There are several methods for collision resolution, including:

Separate Chaining: In this method, each bucket in the hash table contains a linked list of key-value pairs. When a collision occurs, the new key-value pair is added to the linked list at the corresponding bucket.
Open Addressing: In this method, when a collision occurs, the hash table searches for the next available slot in the array to store the key-value pair. This can be done using linear probing, quadratic probing, or double hashing.
Cuckoo Hashing: This method uses two or more hash functions to resolve collisions. When a collision occurs, the existing key-value pair is “kicked out” and rehashed using a different hash function.
Robin Hood Hashing: This method is a variation of open addressing where the key-value pair that is “kicked out” is the one that has been in the hash table the longest, rather than the one that caused the collision.

Load Factor

The load factor of a hash table is defined as the number of entries divided by the number of buckets. It is a measure of how full the hash table is. A high load factor can lead to more collisions and decreased performance, while a low load factor can lead to wasted space. To maintain efficient performance, it is common to resize the hash table when the load factor exceeds a certain threshold (e.g., 0.75).

Implementation

class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.next = None

class HashTable:
    def __init__(self, capacity=8):
        self.__capacity = capacity
        self.__size = 0
        self.__load_factor_threshold = 0.75
        self.__table = [None] * self.__capacity
    
    def display(self):
        for i, node in enumerate(self.__table):
            print(f"Bucket {i}: ", end="")
            current = node
            while current:
                print(f"({current.key}: {current.value}) -> ", end="")
                current = current.next
            print("None")
        print(f"Size: {self.__size}, Capacity: {self.__capacity}, Load Factor: {self.__size / self.__capacity:.2f}")

    def __hash(self, key):
        return hash(key) % self.__capacity

    def __grow(self):
        old_table = self.__table
        self.__capacity *= 2
        self.__table = [None] * self.__capacity
        self.__size = 0

        for node in old_table:
            current = node
            while current:
                self.put(current.key, current.value)
                current = current.next
    
    @property
    def size(self):
        return self.__size
    
    def is_empty(self):
        return self.__size == 0
    
    def put(self, key, value):
        if self.__size / self.__capacity >= self.__load_factor_threshold:
            self.__grow()
        
        index = self.__hash(key)
        new_node = Node(key, value)
        if self.__table[index] is None:
            self.__table[index] = new_node
        else:
            current = self.__table[index]
            while current and current.next:
                if current.key == key:
                    current.value = value
                    return
                current = current.next
            current.next = new_node
        self.__size += 1

    def remove(self, key):
        index = self.__hash(key)
        current = self.__table[index]
        prev = None
        while current:
            if current.key == key:
                if prev:
                    prev.next = current.next
                else:
                    self.__table[index] = current.next
                self.__size -= 1
                return True
            prev = current
            current = current.next
        return False
    
    def get(self, key):
        index = self.__hash(key)
        current = self.__table[index]
        while current:
            if current.key == key:
                return current.value
            current = current.next
        return None

    def for_each(self, func):
        for node in self.__table:
            current = node
            while current:
                func(current.key, current.value)
                current = current.next