scholarly journals A spark-based parallel distributed posterior decoding algorithm for big data hidden Markov models decoding problem

Author(s):  
Imad Sassi ◽  
Samir Anter ◽  
Abdelkrim Bekkhoucha

<span lang="EN-US">Hidden </span><span lang="IN">M</span><span lang="EN-US">arkov models (HMMs) are one of machine learning algorithms which have been widely used and demonstrated their efficiency in many conventional applications. This paper proposes a modified posterior decoding algorithm to solve hidden Markov models decoding problem based on MapReduce paradigm and spark’s resilient distributed dataset (RDDs) concept, for large-scale data processing. The objective of this work is to improve the performances of HMM to deal with big data challenges. The proposed algorithm shows a great improvement in reducing time complexity and provides good results in terms of running time, speedup, and parallelization efficiency for a large amount of data, i.e., large states number and large sequences number.</span>

Author(s):  
Manjunath Thimmasandra Narayanapppa ◽  
T. P. Puneeth Kumar ◽  
Ravindra S. Hegadi

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.


Big Data ◽  
2016 ◽  
pp. 887-898
Author(s):  
Manjunath Thimmasandra Narayanapppa ◽  
T. P. Puneeth Kumar ◽  
Ravindra S. Hegadi

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.


2014 ◽  
Vol 1 (24) ◽  
pp. 165
Author(s):  
Alexander Lvovich Tulupyev ◽  
Andrey Alexandrovich Filchenkov ◽  
Anton Mikhailovich Alexeyev

Author(s):  
Omar Mendoza-González ◽  
Jesús Hernández-Cabrera

The massive production of data in different formats and sources, governmental, social, and legal has created the possibility that government institutions in México can have a clear vision of what society thinks about specific issues. In public health, these data are the base for generating alert indicators on outbreaks of diseases in various regions or communities, based on epidemiological intelligence concepts. The problem that institutions face is the lack of an architecture of software systems suitable for collect, catalog and analyze for to take better decisions and action routes to the health authorities of our country. The objective is to design a Big Data system covering the four main requirements of large-scale data processing. 1 Support large writing workloads from various sources. 2 An elastic architecture, capable of withstanding the peak load times of work and adding or releasing resources as needed. 3 Support intensive analysis, to be able to admit large and diverse reading requests. 3 High availability to support errors in hardware and software.


Sign in / Sign up

Export Citation Format

Share Document