Using Machine Learning Techniques to Aid Environmental Policy Analysis

For a growing class of prediction problems, big data and machine learning (ML) analyses can greatly enhance our understanding of the effectiveness of public investments and public policy. However, the outputs of many ML models are often abstract and inaccessible to policy communities or the general public. In this article, we describe a hands-on teaching case that is suitable for use in a graduate or advanced undergraduate public policy, public affairs, or environmental studies classroom. Students will engage on the use of increasingly popular ML classification algorithms and cloud-based data visualization tools to support policy and planning on the theme of electric vehicle mobility and connected infrastructure. By using these tools, students will critically evaluate and convert large and complex data sets into human understandable visualization for communication and decision making. The tools also enable user flexibility to engage with streaming data sources in a new creative design with little technical background.

Download Full-text

Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets

Journal of Grid Computing ◽

10.1007/s10723-020-09518-y ◽

2020 ◽

Vol 18 (3) ◽

pp. 507-527

Author(s):

M. Ghorbani ◽

S. Swift ◽

S. J. E. Taylor ◽

A. M. Payne

Keyword(s):

Machine Learning ◽

Sequence Data ◽

Machine Learning Techniques ◽

Data Sets ◽

Distributed Data ◽

Complex Data ◽

Network Computing ◽

Data Repositories ◽

Complex Data Sets ◽

User Friendly

Abstract The generation of a feature matrix is the first step in conducting machine learning analyses on complex data sets such as those containing DNA, RNA or protein sequences. These matrices contain information for each object which have to be identified using complex algorithms to interrogate the data. They are normally generated by combining the results of running such algorithms across various datasets from different and distributed data sources. Thus for non-computing experts the generation of such matrices prove a barrier to employing machine learning techniques. Further since datasets are becoming larger this barrier is augmented by the limitations of the single personal computer most often used by investigators to carry out such analyses. Here we propose a user friendly system to generate feature matrices in a way that is flexible, scalable and extendable. Additionally by making use of The Berkeley Open Infrastructure for Network Computing (BOINC) software, the process can be speeded up using distributed volunteer computing possible in most institutions. The system makes use of a combination of the Grid and Cloud User Support Environment (gUSE), combined with the Web Services Parallel Grid Runtime and Developer Environment Portal (WS-PGRADE) to create workflow-based science gateways that allow users to submit work to the distributed computing. This report demonstrates the use of our proposed WS-PGRADE/gUSE BOINC system to identify features to populate matrices from very large DNA sequence data repositories, however we propose that this system could be used to analyse a wide variety of feature sets including image, numerical and text data.

Download Full-text

Bayesian Modelling for Machine Learning

Encyclopedia of Information Science and Technology, First Edition ◽

10.4018/978-1-59140-553-5.ch044 ◽

2005 ◽

pp. 236-242

Author(s):

Paul Rippon ◽

Kerrie Mengersen

Keyword(s):

Machine Learning ◽

Process Model ◽

Bayesian Learning ◽

Data Sets ◽

Complex Data ◽

Sources Of Information ◽

Multiple Sources ◽

Complex Data Sets ◽

Complex Phenomena ◽

Learning Data

Learning algorithms are central to pattern recognition, artificial intelligence, machine learning, data mining, and statistical learning. The term often implies analysis of large and complex data sets with minimal human intervention. Bayesian learning has been variously described as a method of updating opinion based on new experience, updating parameters of a process model based on data, modelling and analysis of complex phenomena using multiple sources of information, posterior probabilistic expectation, and so on. In all of these guises, it has exploded in popularity over recent years.

Download Full-text

Capturing the Physics of MaNGA Galaxies with Self-supervised Machine Learning

The Astrophysical Journal ◽

10.3847/1538-4357/ac1dac ◽

2021 ◽

Vol 921 (2) ◽

pp. 177

Author(s):

Regina Sarmiento ◽

Marc Huertas-Company ◽

Johan H. Knapen ◽

Sebastián F. Sánchez ◽

Helena Domínguez Sánchez ◽

...

Keyword(s):

Machine Learning ◽

Physical Properties ◽

Supervised Machine Learning ◽

Data Sets ◽

Complex Data ◽

Galaxy Surveys ◽

Complex Data Sets ◽

Low Mass ◽

Field Unit ◽

Integral Field

Abstract As available data sets grow in size and complexity, advanced visualization tools enabling their exploration and analysis become more important. In modern astronomy, integral field spectroscopic galaxy surveys are a clear example of increasing high dimensionality and complex data sets, which challenges the traditional methods used to extract the physical information they contain. We present the use of a novel self-supervised machine-learning method to visualize the multidimensional information on stellar population and kinematics in the MaNGA survey in a 2D plane. Our framework is insensitive to nonphysical properties such as the size of the integral field unit and is therefore able to order galaxies according to their resolved physical properties. Using the extracted representations, we study how galaxies distribute based on their resolved and global physical properties. We show that even when exclusively using information about the internal structure, galaxies naturally cluster into two well-known categories, rotating main-sequence disks and massive slow rotators, from a purely data-driven perspective, hence confirming distinct assembly channels. Low-mass rotation-dominated quenched galaxies appear as a third cluster only if information about the integrated physical properties is preserved, suggesting a mixture of assembly processes for these galaxies without any particular signature in their internal kinematics that distinguishes them from the two main groups. The framework for data exploration is publicly released with this publication, ready to be used with the MaNGA or other integral field data sets.

Download Full-text

A machine learning-based new MVA workflow to find correlations in complex data sets applied to fracture diagnostics

10.1190/segam2021-3582310.1 ◽

2021 ◽

Author(s):

Yanrui Ning ◽

Harrison Schumann ◽

Ge Jin ◽

Ali Tura

Keyword(s):

Machine Learning ◽

Data Sets ◽

Complex Data ◽

Complex Data Sets

Download Full-text

Facies classification using machine learning

The Leading Edge ◽

10.1190/tle35100906.1 ◽

2016 ◽

Vol 35 (10) ◽

pp. 906-909 ◽

Cited By ~ 47

Author(s):

Brendon Hall

Keyword(s):

Machine Learning ◽

Big Data ◽

Open Source ◽

Data Sets ◽

Complex Data ◽

Facies Classification ◽

Large Complex ◽

Complex Data Sets ◽

Software Platforms ◽

Tools And Techniques

There has been much excitement recently about big data and the dire need for data scientists who possess the ability to extract meaning from it. Geoscientists, meanwhile, have been doing science with voluminous data for years, without needing to brag about how big it is. But now that large, complex data sets are widely available, there has been a proliferation of tools and techniques for analyzing them. Many free and open-source packages now exist that provide powerful additions to the geoscientist's toolbox, much of which used to be only available in proprietary (and expensive) software platforms.

Download Full-text

Bayesian Modelling for Machine Learning

Intelligent Information Technologies ◽

10.4018/978-1-59904-941-0.ch024 ◽

2011 ◽

pp. 421-429

Author(s):

Paul Rippon ◽

Kerrie Mengersen

Keyword(s):

Machine Learning ◽

Process Model ◽

Bayesian Learning ◽

Data Sets ◽

Complex Data ◽

Sources Of Information ◽

Multiple Sources ◽

Complex Data Sets ◽

Complex Phenomena ◽

Learning Data

Download Full-text

A Study on machine learning methods and applications in genetics and genomics

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.7.10653 ◽

2018 ◽

Vol 7 (1.7) ◽

pp. 201

Author(s):

K Jayanthi ◽

C Mahesh

Keyword(s):

Machine Learning ◽

Gene Prediction ◽

Genomic Data ◽

Data Sets ◽

Complex Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Complex Data Sets ◽

Machine Learning Applications ◽

Applications Of Machine Learning

Machine learning enables computers to help humans in analysing knowledge from large, complex data sets. One of the complex data is genetics and genomic data which needs to analyse various set of functions automatically by the computers. Hope this machine learning methods can provide more useful for making these data for further usage like gene prediction, gene expression, gene ontology, gene finding, gene editing and etc. The purpose of this study is to explore some machine learning applications and algorithms to genetic and genomic data. At the end of this study we conclude the following topics classifications of machine learning problems: supervised, unsupervised and semi supervised, which type of method is suitable for various problems in genomics, applications of machine learning and future views of machine learning in genomics.

Download Full-text

Machine Learning in Cardiology—Ensuring Clinical Impact Lives Up to the Hype

Journal of Cardiovascular Pharmacology and Therapeutics ◽

10.1177/1074248420928651 ◽

2020 ◽

Vol 25 (5) ◽

pp. 379-390 ◽

Cited By ~ 1

Author(s):

Adam J. Russak ◽

Farhan Chaudhry ◽

Jessica K. De Freitas ◽

Garrett Baron ◽

Fayzan F. Chaudhry ◽

...

Keyword(s):

Machine Learning ◽

Data Storage ◽

Data Sets ◽

Complex Data ◽

Public And Private ◽

Treatment And Prevention ◽

Private Sector Investment ◽

Current State ◽

Complex Data Sets ◽

Public And Private Sector

Despite substantial advances in the study, treatment, and prevention of cardiovascular disease, numerous challenges relating to optimally screening, diagnosing, and managing patients remain. Simultaneous improvements in computing power, data storage, and data analytics have led to the development of new techniques to address these challenges. One powerful tool to this end is machine learning (ML), which aims to algorithmically identify and represent structure within data. Machine learning’s ability to efficiently analyze large and highly complex data sets make it a desirable investigative approach in modern biomedical research. Despite this potential and enormous public and private sector investment, few prospective studies have demonstrated improved clinical outcomes from this technology. This is particularly true in cardiology, despite its emphasis on objective, data-driven results. This threatens to stifle ML’s growth and use in mainstream medicine. We outline the current state of ML in cardiology and outline methods through which impactful and sustainable ML research can occur. Following these steps can ensure ML reaches its potential as a transformative technology in medicine.

Download Full-text

Big Data to Knowledge: Application of Machine Learning to Predictive Modeling of Therapeutic Response in Cancer.

Current Genomics ◽

10.2174/1389202921999201224110101 ◽

2020 ◽

Vol 21 ◽

Author(s):

Sukanya Panja ◽

Sarra Rahem ◽

Cassandra J. Chu ◽

Antonina Mitrofanova

Keyword(s):

Machine Learning ◽

Missing Values ◽

Therapeutic Response ◽

Patient Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Complex Data ◽

Human Machine Interaction ◽

Data Repositories ◽

Response Modeling

Background: In recent years, the availability of high throughput technologies, establishment of large molecular patient data repositories, and advancement in computing power and storage have allowed elucidation of complex mechanisms implicated in therapeutic response in cancer patients. The breadth and depth of such data, alongside experimental noise and missing values, requires a sophisticated human-machine interaction that would allow effective learning from complex data and accurate forecasting of future outcomes, ideally embedded in the core of machine learning design. Objective: In this review, we will discuss machine learning techniques utilized for modeling of treatment response in cancer, including Random Forests, support vector machines, neural networks, and linear and logistic regression. We will overview their mathematical foundations and discuss their limitations and alternative approaches all in light of their application to therapeutic response modeling in cancer. Conclusion: We hypothesize that the increase in the number of patient profiles and potential temporal monitoring of patient data will define even more complex techniques, such as deep learning and causal analysis, as central players in therapeutic response modeling.

Download Full-text

Improving Reliability Estimation for Individual Numeric Predictions: A Machine Learning Approach

INFORMS Journal on Computing ◽

10.1287/ijoc.2020.1019 ◽

2021 ◽

Author(s):

Gediminas Adomavicius ◽

Yaqiong Wang

Keyword(s):

Machine Learning ◽

General Purpose ◽

Reliability Estimation ◽

Machine Learning Techniques ◽

Data Sets ◽

Real World Data ◽

Learning Techniques ◽

Reliability Indicator ◽

Machine Learning Approach ◽

Prediction Reliability

Numerical predictive modeling is widely used in different application domains. Although many modeling techniques have been proposed, and a number of different aggregate accuracy metrics exist for evaluating the overall performance of predictive models, other important aspects, such as the reliability (or confidence and uncertainty) of individual predictions, have been underexplored. We propose to use estimated absolute prediction error as the indicator of individual prediction reliability, which has the benefits of being intuitive and providing highly interpretable information to decision makers, as well as allowing for more precise evaluation of reliability estimation quality. As importantly, the proposed reliability indicator allows the reframing of reliability estimation itself as a canonical numeric prediction problem, which makes the proposed approach general-purpose (i.e., it can work in conjunction with any outcome prediction model), alleviates the need for distributional assumptions, and enables the use of advanced, state-of-the-art machine learning techniques to learn individual prediction reliability patterns directly from data. Extensive experimental results on multiple real-world data sets show that the proposed machine learning-based approach can significantly improve individual prediction reliability estimation as compared with a number of baselines from prior work, especially in more complex predictive scenarios.

Download Full-text