Land Subsidence Susceptibility Mapping Using Bayesian, Functional, and Meta-Ensemble Machine Learning Models

To effectively prevent land subsidence over abandoned coal mines, it is necessary to quantitatively identify vulnerable areas. In this study, we evaluated the performance of predictive Bayesian, functional, and meta-ensemble machine learning models in generating land subsidence susceptibility (LSS) maps. All models were trained using half of a land subsidence inventory, and validated using the other half of the dataset. The model performance was evaluated by comparing the area under the receiver operating characteristic (ROC) curve of the resulting LSS map for each model. Among all models tested, the logit boost, which is a meta-ensemble machine leaning model, generated LSS maps with the highest accuracy (91.44%), i.e., higher than that of the other Bayesian and functional machine learning models, including the Bayes net (86.42%), naïve Bayes (85.39%), logistic (88.92%), and multilayer perceptron models (86.76%). The LSS maps produced in this study can be used to mitigate subsidence risk for people and important facilities within the study area, and as a foundation for further studies in other regions.

Download Full-text

Prediction of Head Movement in 360-Degree Videos Using Attention Model

Sensors ◽

10.3390/s21113678 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3678

Author(s):

Dongwon Lee ◽

Minji Choi ◽

Joohyun Lee

Keyword(s):

Machine Learning ◽

Short Term Memory ◽

Moving Average ◽

The Other ◽

Learning Models ◽

Short Term ◽

Term Memory ◽

Attention Model ◽

Long Short Term Memory ◽

Machine Learning Models

In this paper, we propose a prediction algorithm, the combination of Long Short-Term Memory (LSTM) and attention model, based on machine learning models to predict the vision coordinates when watching 360-degree videos in a Virtual Reality (VR) or Augmented Reality (AR) system. Predicting the vision coordinates while video streaming is important when the network condition is degraded. However, the traditional prediction models such as Moving Average (MA) and Autoregression Moving Average (ARMA) are linear so they cannot consider the nonlinear relationship. Therefore, machine learning models based on deep learning are recently used for nonlinear predictions. We use the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) neural network methods, originated in Recurrent Neural Networks (RNN), and predict the head position in the 360-degree videos. Therefore, we adopt the attention model to LSTM to make more accurate results. We also compare the performance of the proposed model with the other machine learning models such as Multi-Layer Perceptron (MLP) and RNN using the root mean squared error (RMSE) of predicted and real coordinates. We demonstrate that our model can predict the vision coordinates more accurately than the other models in various videos.

Download Full-text

O-203 Application of machine learning to predict aneuploidy and mosaicism in embryos from in vitro fertilization (IVF) cycles

Human Reproduction ◽

10.1093/humrep/deab128.014 ◽

2021 ◽

Vol 36 (Supplement_1) ◽

Author(s):

J A Ortiz ◽

R Morales ◽

B Lledo ◽

E Garcia-Hernandez ◽

A Cascales ◽

...

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Predictive Models ◽

Maternal Age ◽

The Other ◽

Predictor Variables ◽

Learning Models ◽

Male Factor ◽

Factors Associated ◽

Machine Learning Models

Abstract Study question Is it possible to predict the likelihood of an IVF embryo being aneuploid and/or mosaic using a machine learning algorithm? Summary answer There are paternal, maternal, embryonic and IVF-cycle factors that are associated with embryonic chromosomal status that can be used as predictors in machine learning models. What is known already The factors associated with embryonic aneuploidy have been extensively studied. Mostly maternal age and to a lesser extent male factor and ovarian stimulation have been related to the occurrence of chromosomal alterations in the embryo. On the other hand, the main factors that may increase the incidence of embryo mosaicism have not yet been established. The models obtained using classical statistical methods to predict embryonic aneuploidy and mosaicism are not of high reliability. As an alternative to traditional methods, different machine and deep learning algorithms are being used to generate predictive models in different areas of medicine, including human reproduction. Study design, size, duration The study design is observational and retrospective. A total of 4654 embryos from 1558 PGT-A cycles were included (January-2017 to December-2020). The trophoectoderm biopsies on D5, D6 or D7 blastocysts were analysed by NGS. Embryos with ≤25% aneuploid cells were considered euploid, between 25-50% were classified as mosaic and aneuploid with >50%. The variables of the PGT-A were recorded in a database from which predictive models of embryonic aneuploidy and mosaicism were developed. Participants/materials, setting, methods The main indications for PGT-A were advanced maternal age, abnormal sperm FISH and recurrent miscarriage or implantation failure. Embryo analysis were performed using Veriseq-NGS (Illumina). The software used to carry out all the analysis was R (RStudio). The library used to implement the different algorithms was caret. In the machine learning models, 22 predictor variables were introduced, which can be classified into 4 categories: maternal, paternal, embryonic and those specific to the IVF cycle. Main results and the role of chance The different couple, embryo and stimulation cycle variables were recorded in a database (22 predictor variables). Two different predictive models were performed, one for aneuploidy and the other for mosaicism. The predictor variable was of multi-class type since it included the segmental and whole chromosome alteration categories. The dataframe were first preprocessed and the different classes to be predicted were balanced. A 80% of the data were used for training the model and 20% were reserved for further testing. The classification algorithms applied include multinomial regression, neural networks, support vector machines, neighborhood-based methods, classification trees, gradient boosting, ensemble methods, Bayesian and discriminant analysis-based methods. The algorithms were optimized by minimizing the Log_Loss that measures accuracy but penalizing misclassifications. The best predictive models were achieved with the XG-Boost and random forest algorithms. The AUC of the predictive model for aneuploidy was 80.8% (Log_Loss 1.028) and for mosaicism 84.1% (Log_Loss: 0.929). The best predictor variables of the models were maternal age, embryo quality, day of biopsy and whether or not the couple had a history of pregnancies with chromosomopathies. The male factor only played a relevant role in the mosaicism model but not in the aneuploidy model. Limitations, reasons for caution Although the predictive models obtained can be very useful to know the probabilities of achieving euploid embryos in an IVF cycle, increasing the sample size and including additional variables could improve the models and thus increase their predictive capacity. Wider implications of the findings Machine learning can be a very useful tool in reproductive medicine since it can allow the determination of factors associated with embryonic aneuploidies and mosaicism in order to establish a predictive model for both. To identify couples at risk of embryo aneuploidy/mosaicism could benefit them of the use of PGT-A. Trial registration number Not Applicable

Download Full-text

Ensemble machine learning models for aviation incident risk prediction

Decision Support Systems ◽

10.1016/j.dss.2018.10.009 ◽

2019 ◽

Vol 116 ◽

pp. 48-63 ◽

Cited By ~ 21

Author(s):

Xiaoge Zhang ◽

Sankaran Mahadevan

Keyword(s):

Machine Learning ◽

Risk Prediction ◽

Learning Models ◽

Ensemble Machine Learning ◽

Machine Learning Models

Download Full-text

A machine learning approach to inform developmental milestone achievement for children with autism (Preprint)

10.2196/preprints.29242 ◽

2021 ◽

Author(s):

Munirul M. Haque ◽

Masud Rabbani ◽

Dipranjan Das Dipal ◽

Md Ishrak Islam Zarif ◽

Anik Iqbal ◽

...

Keyword(s):

Machine Learning ◽

Autism Spectrum Disorder ◽

Children With Autism ◽

Autism Spectrum ◽

The Other ◽

Supervised Machine Learning ◽

Learning Models ◽

Children With Asd ◽

Socio Demographic Factors ◽

Machine Learning Models

BACKGROUND Care for children with autism spectrum disorder (ASD) can be challenging for families and medical care systems. This is especially true in Low-and-Middle-Income-countries (LMIC) like Bangladesh. To improve family-practitioner communication and developmental monitoring of children with ASD, [spell out] (mCARE) was developed. Within this study, mCARE was used to track child milestone achievement and family socio-demographic assets to inform mCARE feasibility/scalability and family-asset informed practitioner recommendations. OBJECTIVE The objectives of this paper are three-fold. First, document how mCARE can be used to monitor child milestone achievement. Second, demonstrate how advanced machine learning models can inform our understanding of milestone achievement in children with ASD. Third, describe family/child socio-demographic factors that are associated with earlier milestone achievement in children with ASD (across five machine learning models). METHODS Using mCARE collected data, this study assessed milestone achievement in 300 children with ASD from Bangladesh. In this study, we used four supervised machine learning (ML) algorithms (Decision Tree, Logistic Regression, k-Nearest Neighbors, Artificial Neural Network) and one unsupervised machine learning (K-means Clustering) to build models of milestone achievement based on family/child socio-demographic details. For analyses, the sample was randomly divided in half to train the ML models and then their accuracy was estimated based on the other half of the sample. Each model was specified for the following milestones: Brushes teeth, Asks to use the toilet, Urinates in the toilet or potty, and Buttons large buttons. RESULTS This study aimed to find a suitable machine learning algorithm for milestone prediction/achievement for children with ASD using family/child socio-demographic characteristics. For, Brushes teeth, the three supervised machine learning models met or exceeded an accuracy of 95% with Logistic Regression, KNN, and ANN as the most robust socio-demographic predictors. For Asks to use toilet, 84.00% accuracy was achieved with the KNN and ANN models. For these models, the family socio-demographic predictors of “family expenditure” and “parents’ age” accounted for most of the model variability. The last two parameters, Urinates in toilet or potty and Buttons large buttons had an accuracy of 91.00% and 76.00%, respectively, in ANN. Overall, the ANN had a higher accuracy (Above ~80% on average) among the other algorithms for all the parameters. Across the models and milestones, “family expenditure”, “family size/ type”, “living places” and “parent’s age and occupation” were the most influential family/child socio-demographic factors. CONCLUSIONS mCARE was successfully deployed in an LMIC (i.e., Bangladesh), allowing parents and care-practitioners a mechanism to share detailed information on child milestones achievement. Using advanced modeling techniques this study demonstrates how family/child socio-demographic elements can inform child milestone achievement. Specifically, families with fewer socio-demographic resources reported later milestone attainment. Developmental science theories highlight how family/systems can directly influence child development and this study provides a clear link between family resources and child developmental progress. Clinical implications for this work could include supporting the larger family system to improve child milestone achievement. CLINICALTRIAL We took the IRB from Marquette University Institutional Review Board on July 9, 2020, with the protocol number HR-1803022959, and titled “MOBILE-BASED CARE FOR CHILDREN WITH AUTISM SPECTRUM DISORDER USING REMOTE EXPERIENCE SAMPLING METHOD (MCARE)” for recruiting a total of 316 subjects, of which we recruited 300. (Details description of participants in Methods section)

Download Full-text

Data-Driven Approach for Predicting and Explaining the Risk of Long-Term Unemployment

E3S Web of Conferences ◽

10.1051/e3sconf/202021401023 ◽

2020 ◽

Vol 214 ◽

pp. 01023

Author(s):

Linan (Frank) Zhao

Keyword(s):

Machine Learning ◽

Age Groups ◽

Learning Models ◽

Public Authorities ◽

Ensemble Machine Learning ◽

European Public ◽

Data Driven Approach ◽

Using Data ◽

Machine Learning Models

Long-term unemployment has significant societal impact and is of particular concerns for policymakers with regard to economic growth and public finances. This paper constructs advanced ensemble machine learning models to predict citizens’ risks of becoming long-term unemployed using data collected from European public authorities for employment service. The proposed model achieves 81.2% accuracy on identifying citizens with high risks of long-term unemployment. This paper also examines how to dissect black-box machine learning models by offering explanations at both a local and global level using SHAP, a state-of-the-art model-agnostic approach to explain factors that contribute to long-term unemployment. Lastly, this paper addresses an under-explored question when applying machine learning in the public domain, that is, the inherent bias in model predictions. The results show that popular models such as gradient boosted trees may produce unfair predictions against senior age groups and immigrants. Overall, this paper sheds light on the recent increasing shift for governments to adopt machine learning models to profile and prioritize employment resources to reduce the detrimental effects of long-term unemployment and improve public welfare.

Download Full-text

Intra-domain and cross-domain transfer learning for time series

10.5194/egusphere-egu21-12142 ◽

2021 ◽

Author(s):

Erik Otović ◽

Marko Njirjak ◽

Dario Jozinović ◽

Goran Mauša ◽

Alberto Michelini ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Transfer Learning ◽

Time Series Data ◽

The Other ◽

Series Data ◽

Sound Recognition ◽

Transfer Of Knowledge ◽

Learning Models ◽

Machine Learning Models

In this study, we compared the performance of machine learning models trained using transfer learning and those that were trained from scratch - on time series data. Four machine learning models were used for the experiment. Two models were taken from the field of seismology, and the other two are general-purpose models for working with time series data. The accuracy of selected models was systematically observed and analyzed when switching within the same domain of application (seismology), as well as between mutually different domains of application (seismology, speech, medicine, finance). In seismology, we used two databases of local earthquakes (one in counts, and the other with the instrument response removed) and a database of global earthquakes for predicting earthquake magnitude; other datasets targeted classifying spoken words (speech), predicting stock prices (finance) and classifying muscle movement from EMG signals (medicine). In practice, it is very demanding and sometimes impossible to collect datasets of tagged data large enough to successfully train a machine learning model. Therefore, in our experiment, we use reduced data sets of 1,500 and 9,000 data instances to mimic such conditions. Using the same scaled-down datasets, we trained two sets of machine learning models: those that used transfer learning for training and those that were trained from scratch. We compared the performances between pairs of models in order to draw conclusions about the utility of transfer learning. In order to confirm the validity of the obtained results, we repeated the experiments several times and applied statistical tests to confirm the significance of the results. The study shows when, within the set experimental framework, the transfer of knowledge brought improvements in terms of model accuracy and in terms of model convergence rate. Our results show that it is possible to achieve better performance and faster convergence by transferring knowledge from the domain of global earthquakes to the domain of local earthquakes; sometimes also vice versa. However, improvements in seismology can sometimes also be achieved by transferring knowledge from medical and audio domains. The results show that the transfer of knowledge between other domains brought even more significant improvements, compared to those within the field of seismology. For example, it has been shown that models in the field of sound recognition have achieved much better performance compared to classical models and that the domain of sound recognition is very compatible with knowledge from other domains. We came to similar conclusions for the domains of medicine and finance. Ultimately, the paper offers suggestions when transfer learning is useful, and the explanations offered can provide a good starting point for knowledge transfer using time series data.

Download Full-text

Efficient Breast Cancer Prediction Using Ensemble Machine Learning Models

2019 4th International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT) ◽

10.1109/rteict46194.2019.9016968 ◽

2019 ◽

Cited By ~ 1

Author(s):

Naveen ◽

R. K. Sharma ◽

Anil Ramachandran Nair

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Learning Models ◽

Cancer Prediction ◽

Ensemble Machine Learning ◽

Machine Learning Models

Download Full-text

Assessing Continuous Operator Workload With a Hybrid Scaffolded Neuroergonomic Modeling Approach

Human Factors The Journal of the Human Factors and Ergonomics Society ◽

10.1177/0018720816672308 ◽

2017 ◽

Vol 59 (1) ◽

pp. 134-146 ◽

Cited By ~ 8

Author(s):

Brett J. Borghetti ◽

Joseph J. Giametta ◽

Christina F. Rusnock

Keyword(s):

Machine Learning ◽

Adaptive Systems ◽

Model Performance ◽

Machine Learning Algorithms ◽

Training Data ◽

State Assessments ◽

Learning Models ◽

Dynamic Task ◽

Operator Workload ◽

Machine Learning Models

Objective: We aimed to predict operator workload from neurological data using statistical learning methods to fit neurological-to-state-assessment models. Background: Adaptive systems require real-time mental workload assessment to perform dynamic task allocations or operator augmentation as workload issues arise. Neuroergonomic measures have great potential for informing adaptive systems, and we combine these measures with models of task demand as well as information about critical events and performance to clarify the inherent ambiguity of interpretation. Method: We use machine learning algorithms on electroencephalogram (EEG) input to infer operator workload based upon Improved Performance Research Integration Tool workload model estimates. Results: Cross-participant models predict workload of other participants, statistically distinguishing between 62% of the workload changes. Machine learning models trained from Monte Carlo resampled workload profiles can be used in place of deterministic workload profiles for cross-participant modeling without incurring a significant decrease in machine learning model performance, suggesting that stochastic models can be used when limited training data are available. Conclusion: We employed a novel temporary scaffold of simulation-generated workload profile truth data during the model-fitting process. A continuous workload profile serves as the target to train our statistical machine learning models. Once trained, the workload profile scaffolding is removed and the trained model is used directly on neurophysiological data in future operator state assessments. Application: These modeling techniques demonstrate how to use neuroergonomic methods to develop operator state assessments, which can be employed in adaptive systems.

Download Full-text

Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction

Journal of Computer-Aided Molecular Design ◽

10.1007/s10822-019-00274-0 ◽

2020 ◽

Vol 34 (7) ◽

pp. 717-730 ◽

Cited By ~ 9

Author(s):

Matthew C. Robinson ◽

Robert C. Glen ◽

Alpha A. Lee

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Numerical Experiments ◽

Large Scale ◽

Operating Characteristic ◽

Characteristic Curve ◽

Learning Models ◽

Bioactivity Prediction ◽

Operating Characteristic Curve ◽

Machine Learning Models

Abstract Machine learning methods may have the potential to significantly accelerate drug discovery. However, the increasing rate of new methodological approaches being published in the literature raises the fundamental question of how models should be benchmarked and validated. We reanalyze the data generated by a recently published large-scale comparison of machine learning models for bioactivity prediction and arrive at a somewhat different conclusion. We show that the performance of support vector machines is competitive with that of deep learning methods. Additionally, using a series of numerical experiments, we question the relevance of area under the receiver operating characteristic curve as a metric in virtual screening. We further suggest that area under the precision–recall curve should be used in conjunction with the receiver operating characteristic curve. Our numerical experiments also highlight challenges in estimating the uncertainty in model performance via scaffold-split nested cross validation.

Download Full-text

Forecasting admissions in psychiatric hospitals before and during Covid-19

10.1101/2021.07.16.21260200 ◽

2021 ◽

Author(s):

Jan Wolff ◽

Ansgar Klimke ◽

Michael Marschollek ◽

Tim Kacprowski

Keyword(s):

Machine Learning ◽

Time Series ◽

Hospital Admissions ◽

Model Performance ◽

Psychiatric Hospitals ◽

Time Series Models ◽

Learning Models ◽

One Step ◽

Machine Learning Models ◽

Better Than

Introduction The COVID-19 pandemic has strong effects on most health care systems and individual services providers. Forecasting of admissions can help for the efficient organisation of hospital care. We aimed to forecast the number of admissions to psychiatric hospitals before and during the COVID-19 pandemic and we compared the performance of machine learning models and time series models. This would eventually allow to support timely resource allocation for optimal treatment of patients. Methods We used admission data from 9 psychiatric hospitals in Germany between 2017 and 2020. We compared machine learning models with time series models in weekly, monthly and yearly forecasting before and during the COVID-19 pandemic. Our models were trained and validated with data from the first two years and tested in prospectively sliding time-windows in the last two years. Results A total of 90,686 admissions were analysed. The models explained up to 90% of variance in hospital admissions in 2019 and 75% in 2020 with the effects of the COVID-19 pandemic. The best models substantially outperformed a one-step seasonal naive forecast (seasonal mean absolute scaled error (sMASE) 2019: 0.59, 2020: 0.76). The best model in 2019 was a machine learning model (elastic net, mean absolute error (MAE): 7.25). The best model in 2020 was a time series model (exponential smoothing state space model with Box-Cox transformation, ARMA errors and trend and seasonal components, MAE: 10.44), which adjusted more quickly to the shock effects of the COVID-19 pandemic. Models forecasting admissions one week in advance did not perform better than monthly and yearly models in 2019 but they did in 2020. The most important features for the machine learning models were calendrical variables. Conclusion Model performance did not vary much between different modelling approaches before the COVID-19 pandemic and established forecasts were substantially better than one-step seasonal naive forecasts. However, weekly time series models adjusted quicker to the COVID-19 related shock effects. In practice, different forecast horizons could be used simultaneously to allow both early planning and quick adjustments to external effects.

Download Full-text