Consistent Hashing

To achieve horizontal scaling, it is important to distribute requests/data efficiently and evenly aross server. Consistent hashing is a commonly used technique to achieve this goal.

The rehashing problem

If you have n cache servers, a common way to balance the load is use the following hash method:

serverIndex = hash(key) % N, where N is the size of the server pool

This approach works well when the size of the server pool is fixed, and the data distribution is even. However, problems arise when new servers are added, or existing servers are removed.

Consistent hashing

Consistent hashing is a special kind of hashing such that when a hash table is re-sized and consistent hashing is used, only k/n keys need to be remapped on average, where k is the number of keys and n is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped.

Hash space and hash ring

Hash servers

Using the same hash function f, we map servers based on server IP or name onto the ring

Hash keys

One thing worth mentioning is that hash function used here is different from the one in "the rehashing problem" and there is no modular operation.

Server lookup

To determine which server a key is stored on, we go clockwise from the key position on the ring until a server is found.

Add a server

Using the logic described above, adding a new server will only request redistribution of a fraction of keys

Remove a server

When a server is removed, only a small fraction of keys request redistribution with consistent hashing.

Two issues in the basic approach

Consistent hashing algorithm

Map servers and keys on to the ring using a uniformly distributed hash function
To find out which server a key is mapped to, go clockwise from the key position until the first server on the ring is found.

Virtual nodes

A virtual node refers to the real node, and each server is represented by multiple virtual nodes on the ring.

Instead of using s0, we have s0_0, s0_1 and s0_2 to represent server 0 on the ring. With virtual nodes, each server is responsible for multiple partitions. Partitions with label s0 are managed by server 0.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
go.mod		go.mod
go.sum		go.sum
impl.go		impl.go
main.go		main.go
nodes.go		nodes.go
ring.go		ring.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Consistent Hashing

The rehashing problem

Consistent hashing

Hash space and hash ring

Hash servers

Hash keys

Server lookup

Add a server

Remove a server

Two issues in the basic approach

Virtual nodes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Consistent Hashing

The rehashing problem

Consistent hashing

Hash space and hash ring

Hash servers

Hash keys

Server lookup

Add a server

Remove a server

Two issues in the basic approach

Virtual nodes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages