Orchestrating Heterogeneous Devices and AI Services as Virtual Sensors for Secure Cloud-Based IoT Applications

Soil moisture mapping at a regional scale is commonplace since these data are required in many applications, such as hydrological and agricultural analyses. The use of remotely sensed data for the estimation of deep soil moisture at a regional scale has received far less emphasis. The objective of this study was to map the 500-m, 8-day average and daily soil moisture at different soil depths in Oklahoma from remotely sensed and ground-measured data using the random forest (RF) method, which is one of the machine-learning approaches. In order to investigate the estimation accuracy of the RF method at both a spatial and a temporal scale, two independent soil moisture estimation experiments were conducted using data from 2010 to 2014: a year-to-year experiment (with a root mean square error (RMSE) ranging from 0.038 to 0.050 m3/m3) and a station-to-station experiment (with an RMSE ranging from 0.044 to 0.057 m3/m3). Then, the data requirements, importance factors, and spatial and temporal variations in estimation accuracy were discussed based on the results using the training data selected by iterated random sampling. The highly accurate estimations of both the surface and the deep soil moisture for the study area reveal the potential of RF methods when mapping soil moisture at a regional scale, especially when considering the high heterogeneity of land-cover types and topography in the study area.

Download Full-text

Construction of a quality model for machine learning systems

Software Quality Journal ◽

10.1007/s11219-021-09557-y ◽

2021 ◽

Author(s):

Julien Siebert ◽

Lisa Joeckel ◽

Jens Heidrich ◽

Adam Trendowicz ◽

Koji Nakamichi ◽

...

Keyword(s):

Machine Learning ◽

Lessons Learned ◽

Training Data ◽

Use Case ◽

Construction Process ◽

Quality Model ◽

Quality Models ◽

Quality Properties ◽

Reference Quality ◽

Industrial Use

AbstractNowadays, systems containing components based on machine learning (ML) methods are becoming more widespread. In order to ensure the intended behavior of a software system, there are standards that define necessary qualities of the system and its components (such as ISO/IEC 25010). Due to the different nature of ML, we have to re-interpret existing qualities for ML systems or add new ones (such as trustworthiness). We have to be very precise about which quality property is relevant for which entity of interest (such as completeness of training data or correctness of trained model), and how to objectively evaluate adherence to quality requirements. In this article, we present how to systematically construct quality models for ML systems based on an industrial use case. This quality model enables practitioners to specify and assess qualities for ML systems objectively. In addition to the overall construction process described, the main outcomes include a meta-model for specifying quality models for ML systems, reference elements regarding relevant views, entities, quality properties, and measures for ML systems based on existing research, an example instantiation of a quality model for a concrete industrial use case, and lessons learned from applying the construction process. We found that it is crucial to follow a systematic process in order to come up with measurable quality properties that can be evaluated in practice. In the future, we want to learn how the term quality differs between different types of ML systems and come up with reference quality models for evaluating qualities of ML systems.

Download Full-text

3D Convolutional Neural Networks and a CrossDocked Dataset for Structure-Based Drug Design

10.26434/chemrxiv.11833323.v2 ◽

2020 ◽

Author(s):

Paul Francoeur ◽

Tomohide Masuda ◽

David R. Koes

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Mean Squared Error ◽

Comprehensive Evaluation ◽

Training Data ◽

Learning Approaches ◽

Neural Network Models ◽

Structure Based Drug Design ◽

Affinity Prediction

One of the main challenges in drug discovery is predicting protein-ligand binding affinity. Recently, machine learning approaches have made substantial progress on this task. However, current methods of model evaluation are overly optimistic in measuring generalization to new targets, and there does not exist a standard dataset of sufficient size to compare performance between models. We present a new dataset for structure-based machine learning, the CrossDocked2020 set, with 22.5 million poses of ligands docked into multiple similar binding pockets across the Protein Data Bank and perform a comprehensive evaluation of grid-based convolutional neural network models on this dataset. We also demonstrate how the partitioning of the training data and test data can impact the results of models trained with the PDBbind dataset, how performance improves by adding more, lower-quality training data, and how training with docked poses imparts pose sensitivity to the predicted affinity of a complex. Our best performing model, an ensemble of 5 densely connected convolutional newtworks, achieves a root mean squared error of 1.42 and Pearson R of 0.612 on the affinity prediction task, an AUC of 0.956 at binding pose classification, and a 68.4% accuracy at pose selection on the CrossDocked2020 set. By providing data splits for clustered cross-validation and the raw data for the CrossDocked2020 set, we establish the first standardized dataset for training machine learning models to recognize ligands in non-cognate target structures while also greatly expanding the number of poses available for training. In order to facilitate community adoption of this dataset for benchmarking protein-ligand binding affinity prediction, we provide our models, weights, and the CrossDocked2020 set at https://github.com/gnina/models.

Download Full-text

Using satellite imagery to understand and promote sustainable development

Science ◽

10.1126/science.abe8628 ◽

2021 ◽

Vol 371 (6535) ◽

pp. eabe8628

Author(s):

Marshall Burke ◽

Anne Driscoll ◽

David B. Lobell ◽

Stefano Ermon

Keyword(s):

Machine Learning ◽

Sustainable Development ◽

Satellite Imagery ◽

Model Building ◽

Model Performance ◽

Training Data ◽

Learning Approaches ◽

Research Directions ◽

Development Outcomes ◽

Research And Policy

Accurate and comprehensive measurements of a range of sustainable development outcomes are fundamental inputs into both research and policy. We synthesize the growing literature that uses satellite imagery to understand these outcomes, with a focus on approaches that combine imagery with machine learning. We quantify the paucity of ground data on key human-related outcomes and the growing abundance and improving resolution (spatial, temporal, and spectral) of satellite imagery. We then review recent machine learning approaches to model-building in the context of scarce and noisy training data, highlighting how this noise often leads to incorrect assessment of model performance. We quantify recent model performance across multiple sustainable development domains, discuss research and policy applications, explore constraints to future progress, and highlight research directions for the field.

Download Full-text

3D Convolutional Neural Networks and a CrossDocked Dataset for Structure-Based Drug Design

10.26434/chemrxiv.11833323.v1 ◽

2020 ◽

Author(s):

Paul Francoeur ◽

Tomohide Masuda ◽

David R. Koes

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Binding Affinity ◽

Mean Squared Error ◽

Comprehensive Evaluation ◽

Training Data ◽

Learning Approaches ◽

Neural Network Models ◽

Structure Based Drug Design ◽

Affinity Prediction

One of the main challenges in drug discovery is predicting protein-ligand binding affinity. Recently, machine learning approaches have made substantial progress on this task. However, current methods of model evaluation are overly optimistic in measuring generalization to new targets, and there does not exist a standard dataset of sufficient size to compare performance between models. We present a new dataset for structure-based machine learning, the CrossDocked2020 set, with 22.5 million poses of ligands docked into multiple similar binding pockets across the Protein Data Bank and perform a comprehensive evaluation of grid-based convolutional neural network models on this dataset. We also demonstrate how the partitioning of the training data and test data can impact the results of models trained with the PDBbind dataset, how performance improves by adding more, lower-quality training data, and how training with docked poses imparts pose sensitivity to the predicted affinity of a complex. Our best performing model, an ensemble of 5 densely connected convolutional newtworks, achieves a root mean squared error of 1.42 and Pearson R of 0.612 on the affinity prediction task, an AUC of 0.956 at binding pose classification, and a 68.4% accuracy at pose selection on the CrossDocked2020 set. By providing data splits for clustered cross-validation and the raw data for the CrossDocked2020 set, we establish the first standardized dataset for training machine learning models to recognize ligands in non-cognate target structures while also greatly expanding the number of poses available for training. In order to facilitate community adoption of this dataset for benchmarking protein-ligand binding affinity prediction, we provide our models, weights, and the CrossDocked2020 set at https://github.com/gnina/models.

Download Full-text

Applying Deep Neural Networks and Ensemble Machine Learning Methods to Forecast Airborne Ambrosia Pollen

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph16111992 ◽

2019 ◽

Vol 16 (11) ◽

pp. 1992 ◽

Cited By ~ 6

Author(s):

Gebreab K. Zewdie ◽

David J. Lary ◽

Estelle Levetin ◽

Gemechu F. Garuma

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Land Surface ◽

Deep Neural Networks ◽

Airborne Pollen ◽

Training Data ◽

Gradient Boosting ◽

Learning Approaches ◽

Ambrosia Pollen ◽

Extreme Gradient Boosting

Allergies to airborne pollen are a significant issue affecting millions of Americans. Consequently, accurately predicting the daily concentration of airborne pollen is of significant public benefit in providing timely alerts. This study presents a method for the robust estimation of the concentration of airborne Ambrosia pollen using a suite of machine learning approaches including deep learning and ensemble learners. Each of these machine learning approaches utilize data from the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric weather and land surface reanalysis. The machine learning approaches used for developing a suite of empirical models are deep neural networks, extreme gradient boosting, random forests and Bayesian ridge regression methods for developing our predictive model. The training data included twenty-four years of daily pollen concentration measurements together with ECMWF weather and land surface reanalysis data from 1987 to 2011 is used to develop the machine learning predictive models. The last six years of the dataset from 2012 to 2017 is used to independently test the performance of the machine learning models. The correlation coefficients between the estimated and actual pollen abundance for the independent validation datasets for the deep neural networks, random forest, extreme gradient boosting and Bayesian ridge were 0.82, 0.81, 0.81 and 0.75 respectively, showing that machine learning can be used to effectively forecast the concentrations of airborne pollen.

Download Full-text

2D Label Free Microscopy Imaging Analysis Using Machine Learning

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.14.coimg-341 ◽

2020 ◽

Vol 2020 (14) ◽

pp. 341-1-341-10

Author(s):

Han Hu ◽

Yang Lei ◽

Daisy Xin ◽

Viktor Shkolnikov ◽

Steven Barcelo ◽

...

Keyword(s):

Machine Learning ◽

Free Cell ◽

Living Cells ◽

Training Data ◽

Learning Approaches ◽

Label Free ◽

Imaging Analysis ◽

Microscopy Imaging ◽

Limited Availability ◽

Free Cells

Separation and isolation of living cells plays an important role in the fields of medicine and biology with label-free imaging often used for isolating cells. The analysis of label-free cell images has many challenges when examining the behavior of cells. This paper presents methods to analyze label-free cells. Many of the tools we describe are based on machine learning approaches. We also investigate ways of augmenting limited availability of training data. Our results demonstrate that our proposed methods are capable of successfully segmenting and classifying label-free cells.

Download Full-text

An Optimization Strategy for Weighted Extreme Learning Machine based on PSO

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001417510016 ◽

2017 ◽

Vol 31 (01) ◽

pp. 1751001 ◽

Cited By ~ 8

Author(s):

Kai Hu ◽

Zhaodi Zhou ◽

Liguo Weng ◽

Jia Liu ◽

Lihua Wang ◽

...

Keyword(s):

Machine Learning ◽

Extreme Learning Machine ◽

Nearest Neighbor ◽

Training Data ◽

Learning Approaches ◽

Cognitive Computing ◽

Weighted Extreme Learning Machine ◽

Machine Systems ◽

Learning Machine ◽

Imbalance Learning

Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous experiences. Among numerous machine learning algorithms, Weighted Extreme Learning Machine (WELM) is one of the famous cases recently. It not only has Extreme Learning Machine (ELM)’s extremely fast training speed and better generalization performance than traditional Neuron Network (NN), but also has the merit in handling imbalance data by assigning more weight to minority class and less weight to majority class. But it still has the limitation of its weight generated according to class distribution of training data, thereby, creating dependency on input data [R. Sharma and A. S. Bist, Genetic algorithm based weighted extreme learning machine for binary imbalance learning, 2015 Int. Conf. Cognitive Computing and Information Processing (CCIP) (IEEE, 2015), pp. 1–6; N. Koutsouleris, Classification/machine learning approaches, Annu. Rev. Clin. Psychol. 13(1) (2016); G. Dudek, Extreme learning machine for function approximation–interval problem of input weights and biases, 2015 IEEE 2nd Int. Conf. Cybernetics (CYBCONF) (IEEE, 2015), pp. 62–67; N. Zhang, Y. Qu and A. Deng, Evolutionary extreme learning machine based weighted nearest-neighbor equality classification, 2015 7th Int. Conf. Intelligent Human-Machine Systems and Cybernetics (IHMSC), Vol. 2 (IEEE, 2015), pp. 274–279]. This leads to the lack of finding optimal weight at which good generalization performance could be achieved [R. Sharma and A. S. Bist, Genetic algorithm based weighted extreme learning machine for binary imbalance learning, 2015 Int. Conf. Cognitive Computing and Information Processing (CCIP) (IEEE, 2015), pp. 1–6; N. Koutsouleris, Classification/machine learning approaches, Annu. Rev. Clin. Psychol. 13(1) (2016); G. Dudek, Extreme learning machine for function approximation–interval problem of input weights and biases, 2015 IEEE 2nd Int. Conf. Cybernetics (CYBCONF) (IEEE, 2015), pp. 62–67; N. Zhang, Y. Qu and A. Deng, Evolutionary extreme learning machine based weighted nearest-neighbor equality classification, 2015 7th Int. Conf. Intelligent Human-Machine Systems and Cybernetics (IHMSC), Vol. 2 (IEEE, 2015), pp. 274–279]. To solve it, a hybrid algorithm which composed by WELM algorithm and Particle Swarm Optimization (PSO) is proposed. Firstly, it distributes the weight according to the number of different samples, determines weighted method; Then, it combines the ELM model and the weighted method to establish WELM model; finally it utilizes PSO to optimize WELM’s three parameters (input weight, bias, the weight of imbalanced training data). Experiment data from both prediction and recognition show that it has better performance than classical WELM algorithms.

Download Full-text

A Comparison of Machine Learning Approaches to Improve Free Topography Data for Flood Modelling

Remote Sensing ◽

10.3390/rs13020275 ◽

2021 ◽

Vol 13 (2) ◽

pp. 275

Author(s):

Michael Meadows ◽

Matthew Wilson

Keyword(s):

Neural Network ◽

Machine Learning ◽

Spatial Patterns ◽

Large Scale ◽

Multiple Scales ◽

Flood Hazard ◽

Training Data ◽

Learning Approaches ◽

Testing Dataset ◽

Topography Data

Given the high financial and institutional cost of collecting and processing accurate topography data, many large-scale flood hazard assessments continue to rely instead on freely-available global Digital Elevation Models, despite the significant vertical biases known to affect them. To predict (and thereby reduce) these biases, we apply a fully-convolutional neural network (FCN), a form of artificial neural network originally developed for image segmentation which is capable of learning from multi-variate spatial patterns at different scales. We assess its potential by training such a model on a wide variety of remote-sensed input data (primarily multi-spectral imagery), using high-resolution, LiDAR-derived Digital Terrain Models published by the New Zealand government as the reference topography data. In parallel, two more widely used machine learning models are also trained, in order to provide benchmarks against which the novel FCN may be assessed. We find that the FCN outperforms the other models (reducing root mean square error in the testing dataset by 71%), likely due to its ability to learn from spatial patterns at multiple scales, rather than only a pixel-by-pixel basis. Significantly for flood hazard modelling applications, corrections were found to be especially effective along rivers and their floodplains. However, our results also suggest that models are likely to be biased towards the land cover and relief conditions most prevalent in their training data, with further work required to assess the importance of limiting training data inputs to those most representative of the intended application area(s).

Download Full-text

Machine Learning Techniques for Hypoglycemia Prediction: Trends and Challenges

Sensors ◽

10.3390/s21020546 ◽

2021 ◽

Vol 21 (2) ◽

pp. 546

Author(s):

Omer Mujahid ◽

Ivan Contreras ◽

Josep Vehi

Keyword(s):

Machine Learning ◽

Diabetic Patient ◽

Life Quality ◽

Training Data ◽

Machine Learning Techniques ◽

Diabetic Patients ◽

Learning Approaches ◽

Prediction Horizon ◽

Learning Techniques

(1) Background: the use of machine learning techniques for the purpose of anticipating hypoglycemia has increased considerably in the past few years. Hypoglycemia is the drop in blood glucose below critical levels in diabetic patients. This may cause loss of cognitive ability, seizures, and in extreme cases, death. In almost half of all the severe cases, hypoglycemia arrives unannounced and is essentially asymptomatic. The inability of a diabetic patient to anticipate and intervene the occurrence of a hypoglycemic event often results in crisis. Hence, the prediction of hypoglycemia is a vital step in improving the life quality of a diabetic patient. The objective of this paper is to review work performed in the domain of hypoglycemia prediction by using machine learning and also to explore the latest trends and challenges that the researchers face in this area; (2) Methods: literature obtained from PubMed and Google Scholar was reviewed. Manuscripts from the last five years were searched for this purpose. A total of 903 papers were initially selected of which 57 papers were eventually shortlisted for detailed review; (3) Results: a thorough dissection of the shortlisted manuscripts provided an interesting split between the works based on two categories: hypoglycemia prediction and hypoglycemia detection. The entire review was carried out keeping this categorical distinction in perspective while providing a thorough overview of the machine learning approaches used to anticipate hypoglycemia, the type of training data, and the prediction horizon.

Download Full-text