2. Fast Computation on Massive Data Sets

Abstract Dealing with the shear size and complexity of today’s massive data sets requires computational platforms that can analyze data in a parallelized and distributed fashion. A major bottleneck that arises in such modern distributed computing environments is that some of the worker nodes may run slow. These nodes a.k.a. stragglers can significantly slow down computation as the slowest node may dictate the overall computational time. A recent computational framework, called encoded optimization, creates redundancy in the data to mitigate the effect of stragglers. In this paper, we develop novel mathematical understanding for this framework demonstrating its effectiveness in much broader settings than was previously understood. We also analyze the convergence behavior of iterative encoded optimization algorithms, allowing us to characterize fundamental trade-offs between convergence rate, size of data set, accuracy, computational load (or data redundancy) and straggler toleration in this framework.

Download Full-text

A methodology for supporting collaborative exploratory analysis of massive data sets in tele-immersive environments

Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469) ◽

10.1109/hpdc.1999.805283 ◽

2003 ◽

Cited By ~ 8

Author(s):

J. Leigh ◽

A.E. Johnson ◽

T.A. DeFanti ◽

S. Bailey ◽

R. Grossman

Keyword(s):

Exploratory Analysis ◽

Massive Data ◽

Data Sets ◽

Massive Data Sets ◽

Immersive Environments

Download Full-text

Massive Data Sets Issues in Earth Observing

Massive Computing - Handbook of Massive Data Sets ◽

10.1007/978-1-4615-0005-6_29 ◽

2002 ◽

pp. 1093-1140 ◽

Cited By ~ 3

Author(s):

Ruixin Yang ◽

Menas Kafatos

Keyword(s):

Massive Data ◽

Data Sets ◽

Massive Data Sets

Download Full-text

Neural Network for Big Data Sets

10.4018/978-1-6684-2408-7.ch003 ◽

2022 ◽

pp. 41-67

Author(s):

Vo Ngoc Phu ◽

Vo Thi Ngoc Tran

Keyword(s):

Neural Network ◽

Big Data ◽

Computer Science ◽

Large Scale ◽

Massive Data ◽

Data Sets ◽

Massive Data Sets ◽

Large Scale Data ◽

Commercial Applications ◽

Novel Model

Machine learning (ML), neural network (NN), evolutionary algorithm (EA), fuzzy systems (FSs), as well as computer science have been very famous and very significant for many years. They have been applied to many different areas. They have contributed much to developments of many large-scale corporations, massive organizations, etc. Lots of information and massive data sets (MDSs) have been generated from these big corporations, organizations, etc. These big data sets (BDSs) have been the challenges of many commercial applications, researches, etc. Therefore, there have been many algorithms of the ML, the NN, the EA, the FSs, as well as computer science which have been developed to handle these massive data sets successfully. To support for this process, the authors have displayed all the possible algorithms of the NN for the large-scale data sets (LSDSs) successfully in this chapter. Finally, they have presented a novel model of the NN for the BDS in a sequential environment (SE) and a distributed network environment (DNE).

Download Full-text

Diabetic Data Warehouses

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch069 ◽

2011 ◽

pp. 359-363

Author(s):

Joseph L. Breault

Keyword(s):

United States ◽

Health Care ◽

The United States ◽

Massive Data ◽

Data Sets ◽

National Academy Of Sciences ◽

Massive Data Sets ◽

Number Of Patients ◽

Indicator Variables ◽

Academy Of Sciences

The National Academy of Sciences convened in 1995 for a conference on massive data sets. The presentation on health care noted that “massive applies in several dimensions . . . the data themselves are massive, both in terms of the number of observations and also in terms of the variables . . . there are tens of thousands of indicator variables coded for each patient” (Goodall, 1995, paragraph 18). We multiply this by the number of patients in the United States, which is hundreds of millions.

Download Full-text

Gut microbiota and artificial intelligence approaches: A scoping review

Health and Technology ◽

10.1007/s12553-020-00486-7 ◽

2020 ◽

Vol 10 (6) ◽

pp. 1343-1358

Author(s):

Ernesto Iadanza ◽

Rachele Fabbri ◽

Džana Bašić-ČiČak ◽

Amedeo Amedei ◽

Jasminka Hasic Telalovic

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Deep Learning ◽

Gut Microbiota ◽

Scoping Review ◽

Massive Data ◽

Data Sets ◽

Research Groups ◽

Human Gut ◽

Massive Data Sets

Abstract This article aims to provide a thorough overview of the use of Artificial Intelligence (AI) techniques in studying the gut microbiota and its role in the diagnosis and treatment of some important diseases. The association between microbiota and diseases, together with its clinical relevance, is still difficult to interpret. The advances in AI techniques, such as Machine Learning (ML) and Deep Learning (DL), can help clinicians in processing and interpreting these massive data sets. Two research groups have been involved in this Scoping Review, working in two different areas of Europe: Florence and Sarajevo. The papers included in the review describe the use of ML or DL methods applied to the study of human gut microbiota. In total, 1109 papers were considered in this study. After elimination, a final set of 16 articles was considered in the scoping review. Different AI techniques were applied in the reviewed papers. Some papers applied ML, while others applied DL techniques. 11 papers evaluated just different ML algorithms (ranging from one to eight algorithms applied to one dataset). The remaining five papers examined both ML and DL algorithms. The most applied ML algorithm was Random Forest and it also exhibited the best performances.

Download Full-text