scholarly journals A qualitative study of Machine Learning practices and engineering challenges in Earth Observation

2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Sophie Jentzsch ◽  
Nico Hochgeschwender

Abstract Machine Learning (ML) is ubiquitously on the advance. Like many domains, Earth Observation (EO) also increasingly relies on ML applications, where ML methods are applied to process vast amounts of heterogeneous and continuous data streams to answer socially and environmentally relevant questions. However, developing such ML- based EO systems remains challenging: Development processes and employed workflows are often barely structured and poorly reported. The application of ML methods and techniques is considered to be opaque and the lack of transparency is contradictory to the responsible development of ML-based EO applications. To improve this situation a better understanding of the current practices and engineering-related challenges in developing ML-based EO applications is required. In this paper, we report observations from an exploratory study where five experts shared their view on ML engineering in semi-structured interviews. We analysed these interviews with coding techniques as often applied in the domain of empirical software engineering. The interviews provide informative insights into the practical development of ML applications and reveal several engineering challenges. In addition, interviewees participated in a novel workflow sketching task, which provided a tangible reflection of implicit processes. Overall, the results confirm a gap between theoretical conceptions and real practices in ML development even though workflows were sketched abstractly as textbook-like. The results pave the way for a large-scale investigation on requirements for ML engineering in EO.

2020 ◽  
Author(s):  
Rene Orth ◽  
Sungmin Oh

<p>Soil moisture plays a key role in land-atmosphere interactions through its influence on the energy and water cycles. Furthermore, its spatiotemporal variations can affect the development and persistence of extreme weather events. Consequently, soil moisture information is required for a wide range of research and applications, such as agricultural monitoring, flood and drought prediction, climate projection, and carbon-cycle modeling. Despite its scientific and societal importance, observations of soil moisture are sparse, in particular across time and at large spatial scales. Only models and satellite retrievals can provide global soil moisture information. While the ability of land surface models to represent the complex land-atmosphere interplay is still limited, satellite-based soil moisture data are a valuable alternative. However, these products suffer from a scaling based on models, and can only capture the top few centimeters of the soil. </p><p>In this study, we aim to augment satellite-based soil moisture data using machine learning. For this purpose we integrate satellite soil moisture with multiple hydro-meteorological data streams to derive global gridded soil moisture using Long Short-Term Memory (LSTM) neural networks. These networks are trained using in-situ soil moisture measurements as target data. With the resulting self-learned relationships, the LSTMs can produce in-situ-like soil moisture globally. We further analyze the implications of using point-scale target data to infer large scale information. The new dataset is derived separately for the surface and the deeper soil, thereby extending beyond the range covered by the satellite-based products. The integration of many data streams and multiple soil moisture observations through a powerful synergistic technique offers the potential to yield high accuracy. This is tested through rigorous cross-validation of the derived dataset. Finally, the planned datasets will permit consistent long-term, large-scale analysis to enhance our understanding of the hydrology-biosphere-climate interplay, to better constrain models and to support hydrological hazards monitoring and climate projections.</p>


Author(s):  
Jarrett Davis ◽  
◽  
Glenn Miles ◽  
Erika Mosebach-Kornelsen ◽  
Sean Blackburn ◽  
...  

As the economic center of Cambodia, Phnom Penh has long been a hotspot for street-involved children and families. While violence is a common facet of life on the street, risk and vulnerability among children is notoriously difficult to measure. Most large-scale surveys tend to sample children within homes and schools, which overlook street-involved children who are commonly unregistered, irregularly attend school, and live outside of houses. This research paper is one of a series of studies on such groups in Southeast Asia. The study conducted 94 semi-structured interviews with street-involved children eight to 18 years of age in Phnom Penh. Physical violence is indicated by the vast majority (77%) of respondents, with significant rates of violence from parents and teachers. Sexual violence is also common, reported by one-in-four (25%), and nearly twice as prevalent among males. As an exploratory study, this research aims to provide a resource for local practitioners and policymakers, and to inform future research.


2020 ◽  
Author(s):  
Jin Soo Lim ◽  
Jonathan Vandermause ◽  
Matthijs A. van Spronsen ◽  
Albert Musaelian ◽  
Christopher R. O’Connor ◽  
...  

Restructuring of interface plays a crucial role in materials science and heterogeneous catalysis. Bimetallic systems, in particular, often adopt very different composition and morphology at surfaces compared to the bulk. For the first time, we reveal a detailed atomistic picture of the long-timescale restructuring of Pd deposited on Ag, using microscopy, spectroscopy, and novel simulation methods. Encapsulation of Pd by Ag always precedes layer-by-layer dissolution of Pd, resulting in significant Ag migration out of the surface and extensive vacancy pits. These metastable structures are of vital catalytic importance, as Ag-encapsulated Pd remains much more accessible to reactants than bulk-dissolved Pd. The underlying mechanisms are uncovered by performing fast and large-scale machine-learning molecular dynamics, followed by our newly developed method for complete characterization of atomic surface restructuring events. Our approach is broadly applicable to other multimetallic systems of interest and enables the previously impractical mechanistic investigation of restructuring dynamics.


2021 ◽  
Author(s):  
Norberto Sánchez-Cruz ◽  
Jose L. Medina-Franco

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>


2019 ◽  
Author(s):  
Ryther Anderson ◽  
Achay Biong ◽  
Diego Gómez-Gualdrón

<div>Tailoring the structure and chemistry of metal-organic frameworks (MOFs) enables the manipulation of their adsorption properties to suit specific energy and environmental applications. As there are millions of possible MOFs (with tens of thousands already synthesized), molecular simulation, such as grand canonical Monte Carlo (GCMC), has frequently been used to rapidly evaluate the adsorption performance of a large set of MOFs. This allows subsequent experiments to focus only on a small subset of the most promising MOFs. In many instances, however, even molecular simulation becomes prohibitively time consuming, underscoring the need for alternative screening methods, such as machine learning, to precede molecular simulation efforts. In this study, as a proof of concept, we trained a neural network as the first example of a machine learning model capable of predicting full adsorption isotherms of different molecules not included in the training of the model. To achieve this, we trained our neural network only on alchemical species, represented only by their geometry and force field parameters, and used this neural network to predict the loadings of real adsorbates. We focused on predicting room temperature adsorption of small (one- and two-atom) molecules relevant to chemical separations. Namely, argon, krypton, xenon, methane, ethane, and nitrogen. However, we also observed surprisingly promising predictions for more complex molecules, whose properties are outside the range spanned by the alchemical adsorbates. Prediction accuracies suitable for large-scale screening were achieved using simple MOF (e.g. geometric properties and chemical moieties), and adsorbate (e.g. forcefield parameters and geometry) descriptors. Our results illustrate a new philosophy of training that opens the path towards development of machine learning models that can predict the adsorption loading of any new adsorbate at any new operating conditions in any new MOF.</div>


2019 ◽  
Vol 19 (1) ◽  
pp. 4-16 ◽  
Author(s):  
Qihui Wu ◽  
Hanzhong Ke ◽  
Dongli Li ◽  
Qi Wang ◽  
Jiansong Fang ◽  
...  

Over the past decades, peptide as a therapeutic candidate has received increasing attention in drug discovery, especially for antimicrobial peptides (AMPs), anticancer peptides (ACPs) and antiinflammatory peptides (AIPs). It is considered that the peptides can regulate various complex diseases which are previously untouchable. In recent years, the critical problem of antimicrobial resistance drives the pharmaceutical industry to look for new therapeutic agents. Compared to organic small drugs, peptide- based therapy exhibits high specificity and minimal toxicity. Thus, peptides are widely recruited in the design and discovery of new potent drugs. Currently, large-scale screening of peptide activity with traditional approaches is costly, time-consuming and labor-intensive. Hence, in silico methods, mainly machine learning approaches, for their accuracy and effectiveness, have been introduced to predict the peptide activity. In this review, we document the recent progress in machine learning-based prediction of peptides which will be of great benefit to the discovery of potential active AMPs, ACPs and AIPs.


Author(s):  
Mark Endrei ◽  
Chao Jin ◽  
Minh Ngoc Dinh ◽  
David Abramson ◽  
Heidi Poxon ◽  
...  

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.


Sign in / Sign up

Export Citation Format

Share Document