Capturing the Physics of MaNGA Galaxies with Self-supervised Machine Learning

Abstract As available data sets grow in size and complexity, advanced visualization tools enabling their exploration and analysis become more important. In modern astronomy, integral field spectroscopic galaxy surveys are a clear example of increasing high dimensionality and complex data sets, which challenges the traditional methods used to extract the physical information they contain. We present the use of a novel self-supervised machine-learning method to visualize the multidimensional information on stellar population and kinematics in the MaNGA survey in a 2D plane. Our framework is insensitive to nonphysical properties such as the size of the integral field unit and is therefore able to order galaxies according to their resolved physical properties. Using the extracted representations, we study how galaxies distribute based on their resolved and global physical properties. We show that even when exclusively using information about the internal structure, galaxies naturally cluster into two well-known categories, rotating main-sequence disks and massive slow rotators, from a purely data-driven perspective, hence confirming distinct assembly channels. Low-mass rotation-dominated quenched galaxies appear as a third cluster only if information about the integrated physical properties is preserved, suggesting a mixture of assembly processes for these galaxies without any particular signature in their internal kinematics that distinguishes them from the two main groups. The framework for data exploration is publicly released with this publication, ready to be used with the MaNGA or other integral field data sets.

Download Full-text

Bayesian Modelling for Machine Learning

Encyclopedia of Information Science and Technology, First Edition ◽

10.4018/978-1-59140-553-5.ch044 ◽

2005 ◽

pp. 236-242

Author(s):

Paul Rippon ◽

Kerrie Mengersen

Keyword(s):

Machine Learning ◽

Process Model ◽

Bayesian Learning ◽

Data Sets ◽

Complex Data ◽

Sources Of Information ◽

Multiple Sources ◽

Complex Data Sets ◽

Complex Phenomena ◽

Learning Data

Learning algorithms are central to pattern recognition, artificial intelligence, machine learning, data mining, and statistical learning. The term often implies analysis of large and complex data sets with minimal human intervention. Bayesian learning has been variously described as a method of updating opinion based on new experience, updating parameters of a process model based on data, modelling and analysis of complex phenomena using multiple sources of information, posterior probabilistic expectation, and so on. In all of these guises, it has exploded in popularity over recent years.

Download Full-text

Studying the Interstellar Medium of H II/BCD Galaxies Using IFU Spectroscopy

Advances in Astronomy ◽

10.1155/2013/631943 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9 ◽

Cited By ~ 9

Author(s):

Patricio Lagos ◽

Polychronis Papaderos

Keyword(s):

Interstellar Medium ◽

Physical Properties ◽

Starburst Galaxies ◽

Chemical Abundances ◽

Spatially Resolved ◽

Low Mass ◽

Field Unit ◽

Integral Field

We review the results from our studies, and previous published work, on the spatially resolved physical properties of a sample of Hii/BCD galaxies, as obtained mainly from integral-field unit spectroscopy with Gemini/GMOS and VLT/VIMOS. We confirm that, within observational uncertainties, our sample galaxies show nearly spatially constant chemical abundances similar to other low-mass starburst galaxies. They also show Heii λ4686 emission with the properties being suggestive of a mix of excitation sources and with Wolf-Rayet stars being excluded as the primary ones. Finally, in this contribution, we include a list of all Hii/BCD galaxies studied thus far with integral-field unit spectroscopy.

Download Full-text

A machine learning-based new MVA workflow to find correlations in complex data sets applied to fracture diagnostics

10.1190/segam2021-3582310.1 ◽

2021 ◽

Author(s):

Yanrui Ning ◽

Harrison Schumann ◽

Ge Jin ◽

Ali Tura

Keyword(s):

Machine Learning ◽

Data Sets ◽

Complex Data ◽

Complex Data Sets

Download Full-text

Facies classification using machine learning

The Leading Edge ◽

10.1190/tle35100906.1 ◽

2016 ◽

Vol 35 (10) ◽

pp. 906-909 ◽

Cited By ~ 47

Author(s):

Brendon Hall

Keyword(s):

Machine Learning ◽

Big Data ◽

Open Source ◽

Data Sets ◽

Complex Data ◽

Facies Classification ◽

Large Complex ◽

Complex Data Sets ◽

Software Platforms ◽

Tools And Techniques

There has been much excitement recently about big data and the dire need for data scientists who possess the ability to extract meaning from it. Geoscientists, meanwhile, have been doing science with voluminous data for years, without needing to brag about how big it is. But now that large, complex data sets are widely available, there has been a proliferation of tools and techniques for analyzing them. Many free and open-source packages now exist that provide powerful additions to the geoscientist's toolbox, much of which used to be only available in proprietary (and expensive) software platforms.

Download Full-text

Bayesian Modelling for Machine Learning

Intelligent Information Technologies ◽

10.4018/978-1-59904-941-0.ch024 ◽

2011 ◽

pp. 421-429

Author(s):

Paul Rippon ◽

Kerrie Mengersen

Keyword(s):

Machine Learning ◽

Process Model ◽

Bayesian Learning ◽

Data Sets ◽

Complex Data ◽

Sources Of Information ◽

Multiple Sources ◽

Complex Data Sets ◽

Complex Phenomena ◽

Learning Data

Download Full-text

A Study on machine learning methods and applications in genetics and genomics

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.7.10653 ◽

2018 ◽

Vol 7 (1.7) ◽

pp. 201

Author(s):

K Jayanthi ◽

C Mahesh

Keyword(s):

Machine Learning ◽

Gene Prediction ◽

Genomic Data ◽

Data Sets ◽

Complex Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Complex Data Sets ◽

Machine Learning Applications ◽

Applications Of Machine Learning

Machine learning enables computers to help humans in analysing knowledge from large, complex data sets. One of the complex data is genetics and genomic data which needs to analyse various set of functions automatically by the computers. Hope this machine learning methods can provide more useful for making these data for further usage like gene prediction, gene expression, gene ontology, gene finding, gene editing and etc. The purpose of this study is to explore some machine learning applications and algorithms to genetic and genomic data. At the end of this study we conclude the following topics classifications of machine learning problems: supervised, unsupervised and semi supervised, which type of method is suitable for various problems in genomics, applications of machine learning and future views of machine learning in genomics.

Download Full-text

Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets

Journal of Grid Computing ◽

10.1007/s10723-020-09518-y ◽

2020 ◽

Vol 18 (3) ◽

pp. 507-527

Author(s):

M. Ghorbani ◽

S. Swift ◽

S. J. E. Taylor ◽

A. M. Payne

Keyword(s):

Machine Learning ◽

Sequence Data ◽

Machine Learning Techniques ◽

Data Sets ◽

Distributed Data ◽

Complex Data ◽

Network Computing ◽

Data Repositories ◽

Complex Data Sets ◽

User Friendly

Abstract The generation of a feature matrix is the first step in conducting machine learning analyses on complex data sets such as those containing DNA, RNA or protein sequences. These matrices contain information for each object which have to be identified using complex algorithms to interrogate the data. They are normally generated by combining the results of running such algorithms across various datasets from different and distributed data sources. Thus for non-computing experts the generation of such matrices prove a barrier to employing machine learning techniques. Further since datasets are becoming larger this barrier is augmented by the limitations of the single personal computer most often used by investigators to carry out such analyses. Here we propose a user friendly system to generate feature matrices in a way that is flexible, scalable and extendable. Additionally by making use of The Berkeley Open Infrastructure for Network Computing (BOINC) software, the process can be speeded up using distributed volunteer computing possible in most institutions. The system makes use of a combination of the Grid and Cloud User Support Environment (gUSE), combined with the Web Services Parallel Grid Runtime and Developer Environment Portal (WS-PGRADE) to create workflow-based science gateways that allow users to submit work to the distributed computing. This report demonstrates the use of our proposed WS-PGRADE/gUSE BOINC system to identify features to populate matrices from very large DNA sequence data repositories, however we propose that this system could be used to analyse a wide variety of feature sets including image, numerical and text data.

Download Full-text

Machine Learning in Cardiology—Ensuring Clinical Impact Lives Up to the Hype

Journal of Cardiovascular Pharmacology and Therapeutics ◽

10.1177/1074248420928651 ◽

2020 ◽

Vol 25 (5) ◽

pp. 379-390 ◽

Cited By ~ 1

Author(s):

Adam J. Russak ◽

Farhan Chaudhry ◽

Jessica K. De Freitas ◽

Garrett Baron ◽

Fayzan F. Chaudhry ◽

...

Keyword(s):

Machine Learning ◽

Data Storage ◽

Data Sets ◽

Complex Data ◽

Public And Private ◽

Treatment And Prevention ◽

Private Sector Investment ◽

Current State ◽

Complex Data Sets ◽

Public And Private Sector

Despite substantial advances in the study, treatment, and prevention of cardiovascular disease, numerous challenges relating to optimally screening, diagnosing, and managing patients remain. Simultaneous improvements in computing power, data storage, and data analytics have led to the development of new techniques to address these challenges. One powerful tool to this end is machine learning (ML), which aims to algorithmically identify and represent structure within data. Machine learning’s ability to efficiently analyze large and highly complex data sets make it a desirable investigative approach in modern biomedical research. Despite this potential and enormous public and private sector investment, few prospective studies have demonstrated improved clinical outcomes from this technology. This is particularly true in cardiology, despite its emphasis on objective, data-driven results. This threatens to stifle ML’s growth and use in mainstream medicine. We outline the current state of ML in cardiology and outline methods through which impactful and sustainable ML research can occur. Following these steps can ensure ML reaches its potential as a transformative technology in medicine.

Download Full-text

Using Machine Learning Techniques to Aid Environmental Policy Analysis

Case Studies in the Environment ◽

10.1525/cse.2020.961302 ◽

2020 ◽

Vol 4 (1) ◽

Author(s):

Omar Isaac Asensio ◽

Ximin Mi ◽

Sameer Dharur

Keyword(s):

Machine Learning ◽

Public Policy ◽

Public Affairs ◽

Streaming Data ◽

Machine Learning Techniques ◽

Data Sets ◽

Complex Data ◽

Public Investments ◽

Complex Data Sets ◽

Prediction Problems

For a growing class of prediction problems, big data and machine learning (ML) analyses can greatly enhance our understanding of the effectiveness of public investments and public policy. However, the outputs of many ML models are often abstract and inaccessible to policy communities or the general public. In this article, we describe a hands-on teaching case that is suitable for use in a graduate or advanced undergraduate public policy, public affairs, or environmental studies classroom. Students will engage on the use of increasingly popular ML classification algorithms and cloud-based data visualization tools to support policy and planning on the theme of electric vehicle mobility and connected infrastructure. By using these tools, students will critically evaluate and convert large and complex data sets into human understandable visualization for communication and decision making. The tools also enable user flexibility to engage with streaming data sources in a new creative design with little technical background.

Download Full-text