Towards Analyzing Computational Costs of Spark for SARS-CoV-2 Sequences Comparisons on a Commercial Cloud
In this paper, we developed a Spark application, named Diff Sequences Spark, which compares 540 SARS-CoV-2 sequences from South America in Amazon EC2 Cloud, generating as output the positions where the differences occur. We analyzed the performance of the proposed application on selected memory and storage optimized virtual machines (VMs) at on-demand and spot markets. The execution times and financial costs of the memory optimized VMs outperformed the storage optimized ones. Regarding the markets, Diff Sequences Spark reduced the average execution times and monetary costs when using spot VMs compared to their respective on-demand VMs, even in scenarios with several spot revocations, benefiting from the low overhead fault tolerance Spark framework.