Sketch 'Em All: Fast Approximate Similarity Search for Dynamic Data Streams
Recommender systems are an integral part of many web applica-
tions. With increasingly larger user bases, scalability has become an
important issue. Many of the most scalable algorithms with respect
to both space and running times are based on locality-sensitive
hashing (LSH). However, a significant drawback is that these meth-
ods are only able to handle insertions to user profiles and tend to
perform poorly when items may be removed. We initiate the study
of scalable locality-sensitive hashing for dynamic input. Specifi-
cally, using the Jaccard index as similarity measure, we design (1)
a sketching algorithm for similarity estimation via a black box re-
duction to ℓ0 norm estimation and (2) a locality sensitive hashing
scheme maintainable in fully dynamic data streams that quickly
filters out low-similarity pairs. Our algorithms have little to no
overhead in terms of running time compared to previous LSH ap-
proaches for the insertion only case, and drastically outperform
previous algorithms in case of deletions