Machine Learning Algorithms Help Scientists Explore Mars

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.

Download Full-text

Large-Scale Machine Learning Algorithms for Biomedical Data Science

Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics - BCB '19 ◽

10.1145/3307339.3342130 ◽

2019 ◽

Author(s):

Heng Huang

Keyword(s):

Machine Learning ◽

Large Scale ◽

Data Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Biomedical Data

Download Full-text

Erratum to: Combining semi-automated image analysis techniques with machine learning algorithms to accelerate large-scale genetic studies

GigaScience ◽

10.1093/gigascience/giy043 ◽

2018 ◽

Vol 7 (7) ◽

Author(s):

Jonathan A Atkinson ◽

Guillaume Lobet ◽

Manuel Noll ◽

Patrick E Meyer ◽

Marcus Griffiths ◽

...

Keyword(s):

Machine Learning ◽

Image Analysis ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Automated Image Analysis ◽

Genetic Studies ◽

Analysis Techniques ◽

Image Analysis Techniques

Download Full-text

The large scale digital mapping of soil organic carbon using machine learning algorithms

Dokuchaev Soil Bulletin ◽

10.19047/0136-1694-2018-91-46-62 ◽

2018 ◽

Vol 91 ◽

pp. 46-62 ◽

Cited By ~ 1

Author(s):

A. V. Chinilin ◽

◽

I. Yu. Savin ◽

Keyword(s):

Machine Learning ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Digital Mapping

Download Full-text

Compendiums of Cancer Transcriptome for Machine Learning Applications

10.1101/353698 ◽

2018 ◽

Cited By ~ 1

Author(s):

Su Bin Lim ◽

Swee Jin Tan ◽

Wan-Teck Lim ◽

Chwee Teck Lim

Keyword(s):

Machine Learning ◽

Large Scale ◽

Meta Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Data Reuse ◽

Human Cancers ◽

Cancer Transcriptome ◽

Cancer Types ◽

Data Source

AbstractBackgroundThere exist massive transcriptome profiles in the form of microarray, enabling reuse. The challenge is that they are processed with diverse platforms and preprocessing tools, requiring considerable time and informatics expertise for cross-dataset or cross-cancer analyses. If there exists a single, integrated data source consisting of thousands of samples, similar to TCGA, data-reuse will be facilitated for discovery, analysis, and validation of biomarker-based clinical strategy.FindingsWe present 11 merged microarray-acquired datasets (MMDs) of major cancer types, curating 8,386 patient-derived tumor and tumor-free samples from 95 GEO datasets. Highly concordant MMD-derived patterns of genome-wide differential gene expression were observed with matching TCGA cohorts. Using machine learning algorithms, we show that clinical models trained from all MMDs, except breast MMD, can be directly applied to RNA-seq-acquired TCGA data with an average accuracy of 0.96 in classifying cancer. Machine learning optimized MMD further aids to reveal immune landscape of human cancers critically needed in disease management and clinical interventions.ConclusionsTo facilitate large-scale meta-analysis, we generated a newly curated, unified, large-scale MMD across 11 cancer types. Besides TCGA, this single data source may serve as an excellent training or test set to apply, develop, and refine machine learning algorithms that can be tapped to better define genomic landscape of human cancers.

Download Full-text

Investigating the Performance of Machine Learning Algorithms for Improving Fault Tolerance for Large Scale Workflow Applications in Cloud Computing

2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE) ◽

10.1109/iccike47802.2019.9004379 ◽

2019 ◽

Author(s):

Soma Prathibha

Keyword(s):

Machine Learning ◽

Cloud Computing ◽

Fault Tolerance ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Super ensemble learning for daily streamflow forecasting: large-scale demonstration and comparison with multiple machine learning algorithms

Neural Computing and Applications ◽

10.1007/s00521-020-05172-3 ◽

2020 ◽

Cited By ~ 1

Author(s):

Hristos Tyralis ◽

Georgia Papacharalampous ◽

Andreas Langousis

Keyword(s):

Machine Learning ◽

Ensemble Learning ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Streamflow Forecasting ◽

Daily Streamflow

Download Full-text

MARTT: Automatic Markup of Taxonomic Descriptions with XML

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais277 ◽

2013 ◽

Author(s):

Hong Cui

Keyword(s):

Machine Learning ◽

Information Content ◽

Large Scale ◽

Learning Algorithms ◽

General Purpose ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods ◽

Taxonomic Descriptions ◽

Efficient Machine

Despite the sub-language nature of taxonomic descriptions of animals and plants, researchers have warned about the existence of large variations among different description collections in terms of information content and its representation. These variations impose a serious threat to the development of automatic tools to structure large volumes of text-based descriptions. This paper presents a general approach to mark up different collections of taxonomic descriptions with XML, using two large-scale floras as examples. The markup system, MARTT, is based on machine learning methods and enhanced by machine learned domain rules and conventions. Experiments show that our simple and efficient machine learning algorithms outperform significantly general purpose algorithms and that rules learned from one flora can be used when marking up a second flora and help to improve the markup performance, especially for elements that have sparse training examples.Malgré la nature de sous-langage des descriptions taxinomiques des animaux et des plantes, les chercheurs reconnaissent l’existence de vastes variations parmi différentes collections de descriptions, en termes de contenu informationnel et de leur représentation. Ces variations présentent une menace sérieuse pour le développement d’outils automatiques pour la structuration de larges…

Download Full-text

Evaluation of Distributed Machine Learning Algorithms for Anomaly Detection from Large-Scale System Logs: A Case Study

2018 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2018.8621967 ◽

2018 ◽

Cited By ~ 1

Author(s):

Merve Astekin ◽

Harun Zengin ◽

Hasan Sozer

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Large Scale System ◽

System Logs ◽

Distributed Machine Learning

Download Full-text