Investigating the Performance of Machine Learning Algorithms for Improving Fault Tolerance for Large Scale Workflow Applications in Cloud Computing

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.

Download Full-text

Large-Scale Machine Learning Algorithms for Biomedical Data Science

Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics - BCB '19 ◽

10.1145/3307339.3342130 ◽

2019 ◽

Author(s):

Heng Huang

Keyword(s):

Machine Learning ◽

Large Scale ◽

Data Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Biomedical Data

Download Full-text

Erratum to: Combining semi-automated image analysis techniques with machine learning algorithms to accelerate large-scale genetic studies

GigaScience ◽

10.1093/gigascience/giy043 ◽

2018 ◽

Vol 7 (7) ◽

Author(s):

Jonathan A Atkinson ◽

Guillaume Lobet ◽

Manuel Noll ◽

Patrick E Meyer ◽

Marcus Griffiths ◽

...

Keyword(s):

Machine Learning ◽

Image Analysis ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Automated Image Analysis ◽

Genetic Studies ◽

Analysis Techniques ◽

Image Analysis Techniques

Download Full-text

An Enhanced Way of Distributed Denial of Service Attack Detection by Applying Machine Learning Algorithms in Cloud Computing

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9317 ◽

2020 ◽

Vol 17 (8) ◽

pp. 3765-3769

Author(s):

N. P. Ponnuviji ◽

M. Vigilson Prem

Keyword(s):

Machine Learning ◽

Cloud Computing ◽

Denial Of Service ◽

Learning Algorithms ◽

Attack Detection ◽

Machine Learning Algorithms ◽

Distributed Denial Of Service ◽

Detection Techniques ◽

Network Bandwidth ◽

Ddos Attack

Cloud Computing has revolutionized the Information Technology by allowing the users to use variety number of resources in different applications in a less expensive manner. The resources are allocated to access by providing scalability flexible on-demand access in a virtual manner, reduced maintenance with less infrastructure cost. The majority of resources are handled and managed by the organizations over the internet by using different standards and formats of the networking protocols. Various research and statistics have proved that the available and existing technologies are prone to threats and vulnerabilities in the protocols legacy in the form of bugs that pave way for intrusion in different ways by the attackers. The most common among attacks is the Distributed Denial of Service (DDoS) attack. This attack targets the cloud’s performance and cause serious damage to the entire cloud computing environment. In the DDoS attack scenario, the compromised computers are targeted. The attacks are done by transmitting a large number of packets injected with known and unknown bugs to a server. A huge portion of the network bandwidth of the users’ cloud infrastructure is affected by consuming enormous time of their servers. In this paper, we have proposed a DDoS Attack detection scheme based on Random Forest algorithm to mitigate the DDoS threat. This algorithm is used along with the signature detection techniques and generates a decision tree. This helps in the detection of signature attacks for the DDoS flooding attacks. We have also used other machine learning algorithms and analyzed based on the yielded results.

Download Full-text

The large scale digital mapping of soil organic carbon using machine learning algorithms

Dokuchaev Soil Bulletin ◽

10.19047/0136-1694-2018-91-46-62 ◽

2018 ◽

Vol 91 ◽

pp. 46-62 ◽

Cited By ~ 1

Author(s):

A. V. Chinilin ◽

◽

I. Yu. Savin ◽

Keyword(s):

Machine Learning ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Digital Mapping

Download Full-text

Compendiums of Cancer Transcriptome for Machine Learning Applications

10.1101/353698 ◽

2018 ◽

Cited By ~ 1

Author(s):

Su Bin Lim ◽

Swee Jin Tan ◽

Wan-Teck Lim ◽

Chwee Teck Lim

Keyword(s):

Machine Learning ◽

Large Scale ◽

Meta Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Data Reuse ◽

Human Cancers ◽

Cancer Transcriptome ◽

Cancer Types ◽

Data Source

AbstractBackgroundThere exist massive transcriptome profiles in the form of microarray, enabling reuse. The challenge is that they are processed with diverse platforms and preprocessing tools, requiring considerable time and informatics expertise for cross-dataset or cross-cancer analyses. If there exists a single, integrated data source consisting of thousands of samples, similar to TCGA, data-reuse will be facilitated for discovery, analysis, and validation of biomarker-based clinical strategy.FindingsWe present 11 merged microarray-acquired datasets (MMDs) of major cancer types, curating 8,386 patient-derived tumor and tumor-free samples from 95 GEO datasets. Highly concordant MMD-derived patterns of genome-wide differential gene expression were observed with matching TCGA cohorts. Using machine learning algorithms, we show that clinical models trained from all MMDs, except breast MMD, can be directly applied to RNA-seq-acquired TCGA data with an average accuracy of 0.96 in classifying cancer. Machine learning optimized MMD further aids to reveal immune landscape of human cancers critically needed in disease management and clinical interventions.ConclusionsTo facilitate large-scale meta-analysis, we generated a newly curated, unified, large-scale MMD across 11 cancer types. Besides TCGA, this single data source may serve as an excellent training or test set to apply, develop, and refine machine learning algorithms that can be tapped to better define genomic landscape of human cancers.

Download Full-text

Super ensemble learning for daily streamflow forecasting: large-scale demonstration and comparison with multiple machine learning algorithms

Neural Computing and Applications ◽

10.1007/s00521-020-05172-3 ◽

2020 ◽

Cited By ~ 1

Author(s):

Hristos Tyralis ◽

Georgia Papacharalampous ◽

Andreas Langousis

Keyword(s):

Machine Learning ◽

Ensemble Learning ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Streamflow Forecasting ◽

Daily Streamflow

Download Full-text

MARTT: Automatic Markup of Taxonomic Descriptions with XML

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais277 ◽

2013 ◽

Author(s):

Hong Cui

Keyword(s):

Machine Learning ◽

Information Content ◽

Large Scale ◽

Learning Algorithms ◽

General Purpose ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods ◽

Taxonomic Descriptions ◽

Efficient Machine

Despite the sub-language nature of taxonomic descriptions of animals and plants, researchers have warned about the existence of large variations among different description collections in terms of information content and its representation. These variations impose a serious threat to the development of automatic tools to structure large volumes of text-based descriptions. This paper presents a general approach to mark up different collections of taxonomic descriptions with XML, using two large-scale floras as examples. The markup system, MARTT, is based on machine learning methods and enhanced by machine learned domain rules and conventions. Experiments show that our simple and efficient machine learning algorithms outperform significantly general purpose algorithms and that rules learned from one flora can be used when marking up a second flora and help to improve the markup performance, especially for elements that have sparse training examples.Malgré la nature de sous-langage des descriptions taxinomiques des animaux et des plantes, les chercheurs reconnaissent l’existence de vastes variations parmi différentes collections de descriptions, en termes de contenu informationnel et de leur représentation. Ces variations présentent une menace sérieuse pour le développement d’outils automatiques pour la structuration de larges…

Download Full-text