Investigating the Performance of Machine Learning Algorithms for Improving Fault Tolerance for Large Scale Workflow Applications in Cloud Computing

Author(s):  
Soma Prathibha
Author(s):  
Manjunath Thimmasandra Narayanapppa ◽  
T. P. Puneeth Kumar ◽  
Ravindra S. Hegadi

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.


2020 ◽  
Vol 17 (8) ◽  
pp. 3765-3769
Author(s):  
N. P. Ponnuviji ◽  
M. Vigilson Prem

Cloud Computing has revolutionized the Information Technology by allowing the users to use variety number of resources in different applications in a less expensive manner. The resources are allocated to access by providing scalability flexible on-demand access in a virtual manner, reduced maintenance with less infrastructure cost. The majority of resources are handled and managed by the organizations over the internet by using different standards and formats of the networking protocols. Various research and statistics have proved that the available and existing technologies are prone to threats and vulnerabilities in the protocols legacy in the form of bugs that pave way for intrusion in different ways by the attackers. The most common among attacks is the Distributed Denial of Service (DDoS) attack. This attack targets the cloud’s performance and cause serious damage to the entire cloud computing environment. In the DDoS attack scenario, the compromised computers are targeted. The attacks are done by transmitting a large number of packets injected with known and unknown bugs to a server. A huge portion of the network bandwidth of the users’ cloud infrastructure is affected by consuming enormous time of their servers. In this paper, we have proposed a DDoS Attack detection scheme based on Random Forest algorithm to mitigate the DDoS threat. This algorithm is used along with the signature detection techniques and generates a decision tree. This helps in the detection of signature attacks for the DDoS flooding attacks. We have also used other machine learning algorithms and analyzed based on the yielded results.


2018 ◽  
Author(s):  
Su Bin Lim ◽  
Swee Jin Tan ◽  
Wan-Teck Lim ◽  
Chwee Teck Lim

AbstractBackgroundThere exist massive transcriptome profiles in the form of microarray, enabling reuse. The challenge is that they are processed with diverse platforms and preprocessing tools, requiring considerable time and informatics expertise for cross-dataset or cross-cancer analyses. If there exists a single, integrated data source consisting of thousands of samples, similar to TCGA, data-reuse will be facilitated for discovery, analysis, and validation of biomarker-based clinical strategy.FindingsWe present 11 merged microarray-acquired datasets (MMDs) of major cancer types, curating 8,386 patient-derived tumor and tumor-free samples from 95 GEO datasets. Highly concordant MMD-derived patterns of genome-wide differential gene expression were observed with matching TCGA cohorts. Using machine learning algorithms, we show that clinical models trained from all MMDs, except breast MMD, can be directly applied to RNA-seq-acquired TCGA data with an average accuracy of 0.96 in classifying cancer. Machine learning optimized MMD further aids to reveal immune landscape of human cancers critically needed in disease management and clinical interventions.ConclusionsTo facilitate large-scale meta-analysis, we generated a newly curated, unified, large-scale MMD across 11 cancer types. Besides TCGA, this single data source may serve as an excellent training or test set to apply, develop, and refine machine learning algorithms that can be tapped to better define genomic landscape of human cancers.


Author(s):  
Hong Cui

Despite the sub-language nature of taxonomic descriptions of animals and plants, researchers have warned about the existence of large variations among different description collections in terms of information content and its representation. These variations impose a serious threat to the development of automatic tools to structure large volumes of text-based descriptions. This paper presents a general approach to mark up different collections of taxonomic descriptions with XML, using two large-scale floras as examples. The markup system, MARTT, is based on machine learning methods and enhanced by machine learned domain rules and conventions. Experiments show that our simple and efficient machine learning algorithms outperform significantly general purpose algorithms and that rules learned from one flora can be used when marking up a second flora and help to improve the markup performance, especially for elements that have sparse training examples.Malgré la nature de sous-langage des descriptions taxinomiques des animaux et des plantes, les chercheurs reconnaissent l’existence de vastes variations parmi différentes collections de descriptions, en termes de contenu informationnel et de leur représentation. Ces variations présentent une menace sérieuse pour le développement d’outils automatiques pour la structuration de larges… 


Sign in / Sign up

Export Citation Format

Share Document