A machine learning-based new MVA workflow to find correlations in complex data sets applied to fracture diagnostics

Author(s):  
Yanrui Ning ◽  
Harrison Schumann ◽  
Ge Jin ◽  
Ali Tura
Author(s):  
Paul Rippon ◽  
Kerrie Mengersen

Learning algorithms are central to pattern recognition, artificial intelligence, machine learning, data mining, and statistical learning. The term often implies analysis of large and complex data sets with minimal human intervention. Bayesian learning has been variously described as a method of updating opinion based on new experience, updating parameters of a process model based on data, modelling and analysis of complex phenomena using multiple sources of information, posterior probabilistic expectation, and so on. In all of these guises, it has exploded in popularity over recent years.


2021 ◽  
Vol 921 (2) ◽  
pp. 177
Author(s):  
Regina Sarmiento ◽  
Marc Huertas-Company ◽  
Johan H. Knapen ◽  
Sebastián F. Sánchez ◽  
Helena Domínguez Sánchez ◽  
...  

Abstract As available data sets grow in size and complexity, advanced visualization tools enabling their exploration and analysis become more important. In modern astronomy, integral field spectroscopic galaxy surveys are a clear example of increasing high dimensionality and complex data sets, which challenges the traditional methods used to extract the physical information they contain. We present the use of a novel self-supervised machine-learning method to visualize the multidimensional information on stellar population and kinematics in the MaNGA survey in a 2D plane. Our framework is insensitive to nonphysical properties such as the size of the integral field unit and is therefore able to order galaxies according to their resolved physical properties. Using the extracted representations, we study how galaxies distribute based on their resolved and global physical properties. We show that even when exclusively using information about the internal structure, galaxies naturally cluster into two well-known categories, rotating main-sequence disks and massive slow rotators, from a purely data-driven perspective, hence confirming distinct assembly channels. Low-mass rotation-dominated quenched galaxies appear as a third cluster only if information about the integrated physical properties is preserved, suggesting a mixture of assembly processes for these galaxies without any particular signature in their internal kinematics that distinguishes them from the two main groups. The framework for data exploration is publicly released with this publication, ready to be used with the MaNGA or other integral field data sets.


2016 ◽  
Vol 35 (10) ◽  
pp. 906-909 ◽  
Author(s):  
Brendon Hall

There has been much excitement recently about big data and the dire need for data scientists who possess the ability to extract meaning from it. Geoscientists, meanwhile, have been doing science with voluminous data for years, without needing to brag about how big it is. But now that large, complex data sets are widely available, there has been a proliferation of tools and techniques for analyzing them. Many free and open-source packages now exist that provide powerful additions to the geoscientist's toolbox, much of which used to be only available in proprietary (and expensive) software platforms.


Author(s):  
Paul Rippon ◽  
Kerrie Mengersen

Learning algorithms are central to pattern recognition, artificial intelligence, machine learning, data mining, and statistical learning. The term often implies analysis of large and complex data sets with minimal human intervention. Bayesian learning has been variously described as a method of updating opinion based on new experience, updating parameters of a process model based on data, modelling and analysis of complex phenomena using multiple sources of information, posterior probabilistic expectation, and so on. In all of these guises, it has exploded in popularity over recent years.


2018 ◽  
Vol 7 (1.7) ◽  
pp. 201
Author(s):  
K Jayanthi ◽  
C Mahesh

Machine learning enables computers to help humans in analysing knowledge from large, complex data sets. One of the complex data is genetics and genomic data which needs to analyse various set of functions automatically by the computers. Hope this machine learning methods can provide more useful for making these data for further usage like gene prediction, gene expression, gene ontology, gene finding, gene editing and etc. The purpose of this study is to explore some machine learning applications and algorithms to genetic and genomic data. At the end of this study we conclude the following topics classifications of machine learning problems: supervised, unsupervised and semi supervised, which type of method is suitable for various problems in genomics, applications of machine learning and future views of machine learning in genomics.


2020 ◽  
Vol 18 (3) ◽  
pp. 507-527
Author(s):  
M. Ghorbani ◽  
S. Swift ◽  
S. J. E. Taylor ◽  
A. M. Payne

Abstract The generation of a feature matrix is the first step in conducting machine learning analyses on complex data sets such as those containing DNA, RNA or protein sequences. These matrices contain information for each object which have to be identified using complex algorithms to interrogate the data. They are normally generated by combining the results of running such algorithms across various datasets from different and distributed data sources. Thus for non-computing experts the generation of such matrices prove a barrier to employing machine learning techniques. Further since datasets are becoming larger this barrier is augmented by the limitations of the single personal computer most often used by investigators to carry out such analyses. Here we propose a user friendly system to generate feature matrices in a way that is flexible, scalable and extendable. Additionally by making use of The Berkeley Open Infrastructure for Network Computing (BOINC) software, the process can be speeded up using distributed volunteer computing possible in most institutions. The system makes use of a combination of the Grid and Cloud User Support Environment (gUSE), combined with the Web Services Parallel Grid Runtime and Developer Environment Portal (WS-PGRADE) to create workflow-based science gateways that allow users to submit work to the distributed computing. This report demonstrates the use of our proposed WS-PGRADE/gUSE BOINC system to identify features to populate matrices from very large DNA sequence data repositories, however we propose that this system could be used to analyse a wide variety of feature sets including image, numerical and text data.


2020 ◽  
Vol 25 (5) ◽  
pp. 379-390 ◽  
Author(s):  
Adam J. Russak ◽  
Farhan Chaudhry ◽  
Jessica K. De Freitas ◽  
Garrett Baron ◽  
Fayzan F. Chaudhry ◽  
...  

Despite substantial advances in the study, treatment, and prevention of cardiovascular disease, numerous challenges relating to optimally screening, diagnosing, and managing patients remain. Simultaneous improvements in computing power, data storage, and data analytics have led to the development of new techniques to address these challenges. One powerful tool to this end is machine learning (ML), which aims to algorithmically identify and represent structure within data. Machine learning’s ability to efficiently analyze large and highly complex data sets make it a desirable investigative approach in modern biomedical research. Despite this potential and enormous public and private sector investment, few prospective studies have demonstrated improved clinical outcomes from this technology. This is particularly true in cardiology, despite its emphasis on objective, data-driven results. This threatens to stifle ML’s growth and use in mainstream medicine. We outline the current state of ML in cardiology and outline methods through which impactful and sustainable ML research can occur. Following these steps can ensure ML reaches its potential as a transformative technology in medicine.


2020 ◽  
Vol 4 (1) ◽  
Author(s):  
Omar Isaac Asensio ◽  
Ximin Mi ◽  
Sameer Dharur

For a growing class of prediction problems, big data and machine learning (ML) analyses can greatly enhance our understanding of the effectiveness of public investments and public policy. However, the outputs of many ML models are often abstract and inaccessible to policy communities or the general public. In this article, we describe a hands-on teaching case that is suitable for use in a graduate or advanced undergraduate public policy, public affairs, or environmental studies classroom. Students will engage on the use of increasingly popular ML classification algorithms and cloud-based data visualization tools to support policy and planning on the theme of electric vehicle mobility and connected infrastructure. By using these tools, students will critically evaluate and convert large and complex data sets into human understandable visualization for communication and decision making. The tools also enable user flexibility to engage with streaming data sources in a new creative design with little technical background.


Author(s):  
Abou_el_ela Abdou Hussein

Day by day advanced web technologies have led to tremendous growth amount of daily data generated volumes. This mountain of huge and spread data sets leads to phenomenon that called big data which is a collection of massive, heterogeneous, unstructured, enormous and complex data sets. Big Data life cycle could be represented as, Collecting (capture), storing, distribute, manipulating, interpreting, analyzing, investigate and visualizing big data. Traditional techniques as Relational Database Management System (RDBMS) couldn’t handle big data because it has its own limitations, so Advancement in computing architecture is required to handle both the data storage requisites and the weighty processing needed to analyze huge volumes and variety of data economically. There are many technologies manipulating a big data, one of them is hadoop. Hadoop could be understand as an open source spread data processing that is one of the prominent and well known solutions to overcome handling big data problem. Apache Hadoop was based on Google File System and Map Reduce programming paradigm. Through this paper we dived to search for all big data characteristics starting from first three V's that have been extended during time through researches to be more than fifty six V's and making comparisons between researchers to reach to best representation and the precise clarification of all big data V’s characteristics. We highlight the challenges that face big data processing and how to overcome these challenges using Hadoop and its use in processing big data sets as a solution for resolving various problems in a distributed cloud based environment. This paper mainly focuses on different components of hadoop like Hive, Pig, and Hbase, etc. Also we institutes absolute description of Hadoop Pros and cons and improvements to face hadoop problems by choosing proposed Cost-efficient Scheduler Algorithm for heterogeneous Hadoop system.


Author(s):  
Avinash Navlani ◽  
V. B. Gupta

In the last couple of decades, clustering has become a very crucial research problem in the data mining research community. Clustering refers to the partitioning of data objects such as records and documents into groups or clusters of similar characteristics. Clustering is unsupervised learning, because of unsupervised nature there is no unique solution for all problems. Most of the time complex data sets require explanation in multiple clustering sets. All the Traditional clustering approaches generate single clustering. There is more than one pattern in a dataset; each of patterns can be interesting in from different perspectives. Alternative clustering intends to find all unlike groupings of the data set such that each grouping has high quality and distinct from each other. This chapter gives you an overall view of alternative clustering; it's various approaches, related work, comparing with various confusing related terms like subspace, multi-view, and ensemble clustering, applications, issues, and challenges.


Sign in / Sign up

Export Citation Format

Share Document