Using Machine Learning Techniques to Aid Environmental Policy Analysis

2020 ◽  
Vol 4 (1) ◽  
Author(s):  
Omar Isaac Asensio ◽  
Ximin Mi ◽  
Sameer Dharur

For a growing class of prediction problems, big data and machine learning (ML) analyses can greatly enhance our understanding of the effectiveness of public investments and public policy. However, the outputs of many ML models are often abstract and inaccessible to policy communities or the general public. In this article, we describe a hands-on teaching case that is suitable for use in a graduate or advanced undergraduate public policy, public affairs, or environmental studies classroom. Students will engage on the use of increasingly popular ML classification algorithms and cloud-based data visualization tools to support policy and planning on the theme of electric vehicle mobility and connected infrastructure. By using these tools, students will critically evaluate and convert large and complex data sets into human understandable visualization for communication and decision making. The tools also enable user flexibility to engage with streaming data sources in a new creative design with little technical background.

2020 ◽  
Vol 18 (3) ◽  
pp. 507-527
Author(s):  
M. Ghorbani ◽  
S. Swift ◽  
S. J. E. Taylor ◽  
A. M. Payne

Abstract The generation of a feature matrix is the first step in conducting machine learning analyses on complex data sets such as those containing DNA, RNA or protein sequences. These matrices contain information for each object which have to be identified using complex algorithms to interrogate the data. They are normally generated by combining the results of running such algorithms across various datasets from different and distributed data sources. Thus for non-computing experts the generation of such matrices prove a barrier to employing machine learning techniques. Further since datasets are becoming larger this barrier is augmented by the limitations of the single personal computer most often used by investigators to carry out such analyses. Here we propose a user friendly system to generate feature matrices in a way that is flexible, scalable and extendable. Additionally by making use of The Berkeley Open Infrastructure for Network Computing (BOINC) software, the process can be speeded up using distributed volunteer computing possible in most institutions. The system makes use of a combination of the Grid and Cloud User Support Environment (gUSE), combined with the Web Services Parallel Grid Runtime and Developer Environment Portal (WS-PGRADE) to create workflow-based science gateways that allow users to submit work to the distributed computing. This report demonstrates the use of our proposed WS-PGRADE/gUSE BOINC system to identify features to populate matrices from very large DNA sequence data repositories, however we propose that this system could be used to analyse a wide variety of feature sets including image, numerical and text data.


Author(s):  
Paul Rippon ◽  
Kerrie Mengersen

Learning algorithms are central to pattern recognition, artificial intelligence, machine learning, data mining, and statistical learning. The term often implies analysis of large and complex data sets with minimal human intervention. Bayesian learning has been variously described as a method of updating opinion based on new experience, updating parameters of a process model based on data, modelling and analysis of complex phenomena using multiple sources of information, posterior probabilistic expectation, and so on. In all of these guises, it has exploded in popularity over recent years.


2021 ◽  
Vol 921 (2) ◽  
pp. 177
Author(s):  
Regina Sarmiento ◽  
Marc Huertas-Company ◽  
Johan H. Knapen ◽  
Sebastián F. Sánchez ◽  
Helena Domínguez Sánchez ◽  
...  

Abstract As available data sets grow in size and complexity, advanced visualization tools enabling their exploration and analysis become more important. In modern astronomy, integral field spectroscopic galaxy surveys are a clear example of increasing high dimensionality and complex data sets, which challenges the traditional methods used to extract the physical information they contain. We present the use of a novel self-supervised machine-learning method to visualize the multidimensional information on stellar population and kinematics in the MaNGA survey in a 2D plane. Our framework is insensitive to nonphysical properties such as the size of the integral field unit and is therefore able to order galaxies according to their resolved physical properties. Using the extracted representations, we study how galaxies distribute based on their resolved and global physical properties. We show that even when exclusively using information about the internal structure, galaxies naturally cluster into two well-known categories, rotating main-sequence disks and massive slow rotators, from a purely data-driven perspective, hence confirming distinct assembly channels. Low-mass rotation-dominated quenched galaxies appear as a third cluster only if information about the integrated physical properties is preserved, suggesting a mixture of assembly processes for these galaxies without any particular signature in their internal kinematics that distinguishes them from the two main groups. The framework for data exploration is publicly released with this publication, ready to be used with the MaNGA or other integral field data sets.


2016 ◽  
Vol 35 (10) ◽  
pp. 906-909 ◽  
Author(s):  
Brendon Hall

There has been much excitement recently about big data and the dire need for data scientists who possess the ability to extract meaning from it. Geoscientists, meanwhile, have been doing science with voluminous data for years, without needing to brag about how big it is. But now that large, complex data sets are widely available, there has been a proliferation of tools and techniques for analyzing them. Many free and open-source packages now exist that provide powerful additions to the geoscientist's toolbox, much of which used to be only available in proprietary (and expensive) software platforms.


Author(s):  
Paul Rippon ◽  
Kerrie Mengersen

Learning algorithms are central to pattern recognition, artificial intelligence, machine learning, data mining, and statistical learning. The term often implies analysis of large and complex data sets with minimal human intervention. Bayesian learning has been variously described as a method of updating opinion based on new experience, updating parameters of a process model based on data, modelling and analysis of complex phenomena using multiple sources of information, posterior probabilistic expectation, and so on. In all of these guises, it has exploded in popularity over recent years.


2018 ◽  
Vol 7 (1.7) ◽  
pp. 201
Author(s):  
K Jayanthi ◽  
C Mahesh

Machine learning enables computers to help humans in analysing knowledge from large, complex data sets. One of the complex data is genetics and genomic data which needs to analyse various set of functions automatically by the computers. Hope this machine learning methods can provide more useful for making these data for further usage like gene prediction, gene expression, gene ontology, gene finding, gene editing and etc. The purpose of this study is to explore some machine learning applications and algorithms to genetic and genomic data. At the end of this study we conclude the following topics classifications of machine learning problems: supervised, unsupervised and semi supervised, which type of method is suitable for various problems in genomics, applications of machine learning and future views of machine learning in genomics.


2020 ◽  
Vol 25 (5) ◽  
pp. 379-390 ◽  
Author(s):  
Adam J. Russak ◽  
Farhan Chaudhry ◽  
Jessica K. De Freitas ◽  
Garrett Baron ◽  
Fayzan F. Chaudhry ◽  
...  

Despite substantial advances in the study, treatment, and prevention of cardiovascular disease, numerous challenges relating to optimally screening, diagnosing, and managing patients remain. Simultaneous improvements in computing power, data storage, and data analytics have led to the development of new techniques to address these challenges. One powerful tool to this end is machine learning (ML), which aims to algorithmically identify and represent structure within data. Machine learning’s ability to efficiently analyze large and highly complex data sets make it a desirable investigative approach in modern biomedical research. Despite this potential and enormous public and private sector investment, few prospective studies have demonstrated improved clinical outcomes from this technology. This is particularly true in cardiology, despite its emphasis on objective, data-driven results. This threatens to stifle ML’s growth and use in mainstream medicine. We outline the current state of ML in cardiology and outline methods through which impactful and sustainable ML research can occur. Following these steps can ensure ML reaches its potential as a transformative technology in medicine.


2020 ◽  
Vol 21 ◽  
Author(s):  
Sukanya Panja ◽  
Sarra Rahem ◽  
Cassandra J. Chu ◽  
Antonina Mitrofanova

Background: In recent years, the availability of high throughput technologies, establishment of large molecular patient data repositories, and advancement in computing power and storage have allowed elucidation of complex mechanisms implicated in therapeutic response in cancer patients. The breadth and depth of such data, alongside experimental noise and missing values, requires a sophisticated human-machine interaction that would allow effective learning from complex data and accurate forecasting of future outcomes, ideally embedded in the core of machine learning design. Objective: In this review, we will discuss machine learning techniques utilized for modeling of treatment response in cancer, including Random Forests, support vector machines, neural networks, and linear and logistic regression. We will overview their mathematical foundations and discuss their limitations and alternative approaches all in light of their application to therapeutic response modeling in cancer. Conclusion: We hypothesize that the increase in the number of patient profiles and potential temporal monitoring of patient data will define even more complex techniques, such as deep learning and causal analysis, as central players in therapeutic response modeling.


Author(s):  
Gediminas Adomavicius ◽  
Yaqiong Wang

Numerical predictive modeling is widely used in different application domains. Although many modeling techniques have been proposed, and a number of different aggregate accuracy metrics exist for evaluating the overall performance of predictive models, other important aspects, such as the reliability (or confidence and uncertainty) of individual predictions, have been underexplored. We propose to use estimated absolute prediction error as the indicator of individual prediction reliability, which has the benefits of being intuitive and providing highly interpretable information to decision makers, as well as allowing for more precise evaluation of reliability estimation quality. As importantly, the proposed reliability indicator allows the reframing of reliability estimation itself as a canonical numeric prediction problem, which makes the proposed approach general-purpose (i.e., it can work in conjunction with any outcome prediction model), alleviates the need for distributional assumptions, and enables the use of advanced, state-of-the-art machine learning techniques to learn individual prediction reliability patterns directly from data. Extensive experimental results on multiple real-world data sets show that the proposed machine learning-based approach can significantly improve individual prediction reliability estimation as compared with a number of baselines from prior work, especially in more complex predictive scenarios.


Sign in / Sign up

Export Citation Format

Share Document