Performance evaluation of containers and virtual machines when running Cassandra workload concurrently
NoSQL distributed databases are often used as Big Data platforms. To provide efficient resource sharing and cost effectiveness, such distributed databases typically run concurrently on a virtualized infrastructure that could be implemented using hypervisor-based virtualization or container-based virtualization. Hypervisor-based virtualization is a mature technology but imposes overhead on CPU, networking, and disk. Recently, by sharing the operating system resources and simplifying the deployment of applications, container-based virtualization is getting more popular. This article presents a performance comparison between multiple instances of VMware VMs and Docker containers running concurrently. Our workload models a real-world Big Data Apache Cassandra application from Ericsson. As a baseline, we evaluated the performance of Cassandra when running on the nonvirtualized physical infrastructure. Our study shows that Docker has lower overhead compared with VMware; the performance on the container-based infrastructure was as good as on the nonvirtualized. Our performance evaluations also show that running multiple instances of a Cassandra database concurrently affected the performance of read and write operations differently; for both VMware and Docker, the maximum number of read operations was reduced when we ran several instances concurrently, whereas the maximum number of write operations increased when we ran instances concurrently.