Similarity Search for Dynamic Data Streams
Nearest-neighbor searching systems are an integral part of many online applications, including but not limited to pattern recognition, plagiarism detection and recommender systems. With increasingly larger data sets, scalability has become an important issue. Many of the most space and running time efficient algorithms are based on locality sensitive hashing. The de facto standard approach to quickly answer nearest-neighbor queries on such a data set is usually a form of min-hashing.