Locality sensitive hashing | Ricerc@Sapienza

Sketch 'Em All: Fast Approximate Similarity Search for Dynamic Data Streams

Recommender systems are an integral part of many web applica-
tions. With increasingly larger user bases, scalability has become an
important issue. Many of the most scalable algorithms with respect
to both space and running times are based on locality-sensitive
hashing (LSH). However, a significant drawback is that these meth-
ods are only able to handle insertions to user profiles and tend to
perform poorly when items may be removed. We initiate the study
of scalable locality-sensitive hashing for dynamic input. Specifi-

On the distortion of locality sensitive hashing

Given a notion of pairwise similarity between objects, locality sensitive hashing (LSH) aims to construct a hash function family over the universe of objects such that the probability two objects hash to the same value is their similarity. LSH is a powerful algorithmic tool for large scale applications and much work has been done to understand LSHable similarities, i.e., similarities that admit an LSH. In this paper we focus on similarities that are provably non-LSHable and propose a notion of distortion to capture the approximation of such a similarity by an LSHable similarity.

The distortion of locality sensitive hashing

Given a pairwise similarity notion between objects, locality sensitive hashing (LSH) aims to construct a hash function family over the universe of objects such that the probability two objects hash to the same value is their similarity. LSH is a powerful algorithmic tool for large-scale applications and much work has been done to understand LSHable similarities, i.e., similarities that admit an LSH. In this paper we focus on similarities that are provably non-LSHable and propose a notion of distortion to capture the approximation of such a similarity by a similarity that is LSHable.