3. Experimental protocol
In order to test the effect of backward shift deletion on performance, I am going to use the same test cases that I used in my previous article about Robin Hood hashing . The only difference is that since I observed that there was no difference between the tables of size 10k and 100k, this time I am only plotting the results for tables of size 10k. For more details regarding the test cases, take a look that the “Experiment protocol” section in the my previous article .
The results are presented in the four figures below, each figure showing the same statistic across the different test cases:
Figure 2 is the mean DIB
Figure 3 is the median of DIB
Figure 4 is the 95th percentile of DIB
Figure 5 is the variance of DIB
Each of these figures is holding sub-figures, each for a different test case:
(a) is the “batch” test case, with LFM=0.8 and LFR=0.1
(b) is the “batch” test case, with LFM=0.8 and LFR=0.8
(c) is the “ripple” test case, with LFM=0.8 and LFR=0.1
(d) is the “loading” test case
The mean DIBs are adjusted for graphs of the the Robin Hood with tombstones, in such a way that the minimum DIB is shifted to zero, and this in order to make a fair comparison with the other algorithms. Indeed, because the implementation of Robin Hood hashing with tombstones is considering only probes between the minimum and maximum DIBs, the probing of an item never starts at DIB 0 but at the minimum DIB. The graphs for Robin Hood hashing with backward shift deletion and the graphs for basic linear probing are not shifted down.
In the results presented above, it is clear that Robin Hood hashing with backward shift deletion outperforms both basic linear probing and Robin Hood hashing with tombstones. In addition, the mean DIB and variance of DIB are constant, and this even after a large number of insertion and deletion operations, which is consistent with the results presented in the thesis of Celis .
The most striking results are is Figure 4, where the 95th percentile for Robin Hood hashing with backward shift deletion remains at a value of around 7, which proves that even in the worst cases, the number of probes to find an entry will be very small.
The algorithm I had implemented in my first version of Robin Hood hashing using tombstones  was the one described by the pseudo-code of the original thesis . Yet, I was unable to reproduce the results presented in that same thesis. The reason is that the results were describing the theoretical performance based on the mathematical model. And if the math were right, the given pseudo-code was doing it wrong. Thanks to Paul Khuong and his suggestion of using a backward shift on deletion, the practical results now match the theoretical results.
An interesting follow-up would be to experiment to see how costly the backward shift is for the CPU. With these new results, my feeling is now that Robin Hood hashing with backward shift deletion is definitely an interesting algorithm, and given its very linear memory access pattern, it would be worth investigating further for an on-disk key-value store.