lphash
Résumé
LPHash is a minimal perfect hash function (or MPHF) designed for k-mer sets where overlaps of k-1 bases between consecutive k-mers are exploited to:
Reduce the space of the data structure. For example, with the right combination of parameters, LPHash is able to achive < 0.9 bits/k-mer in practice whereas the known theoretical lower-bound of a classic MPHF is 1.44 bits/k-mer and practical constructions take 2-3 bits/k-mer.
Boost the evaluation speed of queries performed for consecutive k-mers.
Preserve the locality of k-mers: consecutive k-mers are likely to receive consecutive hash codes.
The data structure and its construction/query algorithms are described in the paper Locality-Preserving Minimal Perfect Hashing of k-mers [1].