Greedy versus Optimal Analysis of Bounded Size Dictionary Compression and On-the-Fly Distributed Computing.

2020

Proceedings Prague Stringology Conference

Greedy versus Optimal Analysis of Bounded Size Dictionary Compression and On-the-Fly Distributed Computing.

04 Pubblicazione in atti di convegno

De Agostino Sergio

Scalability and robustness are not an issue when compression is applied for massive data storage, in the context of distributed computing. Speeding up on-the-fly compression for data transmission is more controversial. In such case, a compression
technique merging together an adaptive and a non-adaptive approach has to be considered. A practical implementation of LZW (Lempel, Ziv and Welch) compression,called LZC (C indicates the Unix command ’compress’), has this characteristic. The
non-adaptive phases work with bounded size prefix dictionaries built by LZW factorizations during the adaptive ones. In order to improve the compression effectiveness,we suggest to apply LZMW (Lempel, Ziv, Miller and Wegman) factorization to LZC
compression (LZCMW) during the adaptive phases since it builds better dictionaries than LZW. The LZMW heuristic was originally thought with a dictionary bounded by the least recently used strategy. We introduce the LZCMW heuristic in order to have non-adaptive phases. All the heuristics mentioned above employ the greedy approach. We show, finally, a worst case analysis of the greedy approach with respect to the optimal solution decodable by the LZC decompressor. Such analysis suggests parallelization of on the fly compression is not suitable for highly disseminated data since the non-adaptive phases are too far from optimal.

distributed system factorization on the fly compression scalability