scholarly journals A Meta-Learning Approach of Optimisation for Spatial Prediction of Landslides

2021 ◽  
Vol 13 (22) ◽  
pp. 4521
Author(s):  
Biswajeet Pradhan ◽  
Maher Ibrahim Sameen ◽  
Husam A. H. Al-Najjar ◽  
Daichao Sheng ◽  
Abdullah M. Alamri ◽  
...  

Optimisation plays a key role in the application of machine learning in the spatial prediction of landslides. The common practice in optimising landslide prediction models is to search for optimal/suboptimal hyperparameter values in a number of predetermined hyperparameter configurations based on an objective function, i.e., k-fold cross-validation accuracy. However, the overhead of hyperparameter optimisation can be prohibitive, especially for computationally expensive algorithms. This paper introduces an optimisation approach based on meta-learning for the spatial prediction of landslides. The proposed approach is tested in a dense tropical forested area of Cameron Highlands, Malaysia. Instead of optimising prediction models with a large number of hyperparameter configurations, the proposed approach begins with promising configurations based on several basic and statistical meta-features. The proposed meta-learning approach was tested based on Bayesian optimisation as a hyperparameter tuning algorithm and random forest (RF) as a prediction model. The spatial database was established with a total of 63 historical landslides and 15 conditioning factors. Three RF models were constructed based on (1) default parameters as suggested by the sklearn library, (2) parameters suggested by the Bayesian optimisation (BO), and (3) parameters suggested by the proposed meta-learning approach (BO-ML). Based on five-fold cross-validation accuracy, the Bayesian method achieved the best performance for both the training (0.810) and test (0.802) datasets. The meta-learning approach achieved slightly lower accuracies than the Bayesian method for the training (0.769) and test (0.800) datasets. Similarly, based on F1-score and area under the receiving operating characteristic curves (AUROC), the models with optimised parameters either by the Bayesian or meta-learning methods produced more accurate landslide susceptibility assessment than the model with the default parameters. In the present approach, instead of learning from scratch, the meta-learning would begin with hyperparameter configurations optimal for the most similar previous datasets, which can be considerably helpful and time-saving for landslide modelings.

2019 ◽  
Vol 15 ◽  
pp. 117693431987129 ◽  
Author(s):  
Yiyou Song ◽  
Qingru Xu ◽  
Zhen Wei ◽  
Di Zhen ◽  
Jionglong Su ◽  
...  

Currently, although many successful bioinformatics efforts have been reported in the epitranscriptomics field for N6-methyladenosine (m6A) site identification, none is focused on the substrate specificity of different m6A-related enzymes, ie, the methyltransferases (writers) and demethylases (erasers). In this work, to untangle the target specificity and the regulatory functions of different RNA m6A writers (METTL3-METT14 and METTL16) and erasers (ALKBH5 and FTO), we extracted 49 genomic features along with the conventional sequence features and used the machine learning approach of random forest to predict their epitranscriptome substrates. Our method achieved reasonable performance on both the writer target prediction (as high as 0.918) and the eraser target prediction (as high as 0.888) in a 5-fold cross-validation, and results of the gene ontology analysis of their preferential targets further revealed the functional relevance of different RNA methylation writers and erasers.


Teknika ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 18-26
Author(s):  
Hendry Cipta Husada ◽  
Adi Suryaputra Paramita

Perkembangan teknologi saat ini telah memberikan kemudahan bagi banyak orang dalam mendapatkan dan menyebarkan informasi di berbagai social media platform. Twitter merupakan salah satu media yang kerap digunakan untuk menyampaikan opini sebagai bentuk reaksi seseorang atas suatu hal. Opini yang terdapat di Twitter dapat digunakan perusahaan maskapai penerbangan sebagai parameter kunci untuk mengetahui tingkat kepuasan publik sekaligus bahan evaluasi bagi perusahaan. Berdasarkan hal tersebut, diperlukan sebuah metode yang dapat secara otomatis melakukan klasifikasi opini ke dalam kategori positif, negatif, atau netral melalui proses analisis sentimen. Proses analisis sentimen dilakukan dengan proses data preprocessing, pembobotan kata menggunakan metode TF-IDF, penerapan algoritma, dan pembahasan atas hasil klasifikasi. Klasifikasi opini dilakukan dengan machine learning approach memanfaatkan algoritma multi-class Support Vector Machine (SVM). Data yang digunakan dalam penelitian ini adalah opini dalam bahasa Inggris dari para pengguna Twitter terhadap maskapai penerbangan. Berdasarkan pengujian yang telah dilakukan, hasil klasifikasi terbaik diperoleh menggunakan SVM kernel RBF pada nilai parameter 𝐶(complexity) = 10 dan 𝛾(gamma) = 1, dengan nilai accuracy sebesar 84,37% dan 80,41% ketika menggunakan 10-fold cross validation.


2020 ◽  
Author(s):  
Rafael Massahiro Yassue ◽  
José Felipe Gonzaga Sabadin ◽  
Giovanni Galli ◽  
Filipe Couto Alves ◽  
Roberto Fritsche-Neto

AbstractUsually, the comparison among genomic prediction models is based on validation schemes as Repeated Random Subsampling (RRS) or K-fold cross-validation. Nevertheless, the design of training and validation sets has a high effect on the way and subjectiveness that we compare models. Those procedures cited above have an overlap across replicates that might cause an overestimated estimate and lack of residuals independence due to resampling issues and might cause less accurate results. Furthermore, posthoc tests, such as ANOVA, are not recommended due to assumption unfulfilled regarding residuals independence. Thus, we propose a new way to sample observations to build training and validation sets based on cross-validation alpha-based design (CV-α). The CV-α was meant to create several scenarios of validation (replicates x folds), regardless of the number of treatments. Using CV-α, the number of genotypes in the same fold across replicates was much lower than K-fold, indicating higher residual independence. Therefore, based on the CV-α results, as proof of concept, via ANOVA, we could compare the proposed methodology to RRS and K-fold, applying four genomic prediction models with a simulated and real dataset. Concerning the predictive ability and bias, all validation methods showed similar performance. However, regarding the mean squared error and coefficient of variation, the CV-α method presented the best performance under the evaluated scenarios. Moreover, as it has no additional cost nor complexity, it is more reliable and allows the use of non-subjective methods to compare models and factors. Therefore, CV-α can be considered a more precise validation methodology for model selection.


SLEEP ◽  
2020 ◽  
Vol 43 (Supplement_1) ◽  
pp. A236-A236
Author(s):  
A Guillot ◽  
T Moutakanni ◽  
M Harris ◽  
P J Arnal ◽  
V Thorey

Abstract Introduction Polysomnography (PSG) is the gold-standard to diagnose obstructive sleep apnea (OSA). OSA severity diagnosis is defined by the apnea-hypopnea index (AHI) defined as the number of apnea and hypopnea events measured per hour of sleep. The Dreem2 headband (DH) is a self-administered, easy to use device that measure EEG, breathing frequency, heart rate and sound at-home. In our study, we assessed the performance of the DH to automatically detects OSA compared to 3 sleep’s experts scoring on PSG. Methods 41 subjects (8 females, 42.6 ± 13.7 y.o.) having a suspicion of OSA performed a night at-home wearing both a PSG and the DH. Each PSG record was scored for apnea and hypopnea events by 3 independent trained sleep experts following AASM guidelines. The deep learning approach DOSED, was trained on the DH signals using the manual apnea scoring. 10-fold cross-validation was used to provide predictions for each of the 41 subjects with the DH. Results We observed an average AHI expert’s scoring of 13.6 ± 10.1 CI[10.5, 16.5] compared to 12.9 ± 10.3 CI[9.6, 15.8] for the DH. Both, the correlation between the 3 scorers (r= 0.88, p < 0.001) and the DH and the scorers (r=0.79, p< 0.001) were significant. The specificity and sensitivity to detect mild OSA (AHI ≤ 5) was 84.4 % and 96.4 % for the DH and 86.5 % and 86.0% for the scorers. Conclusion The results show that the DH using deep learning can detect OSA with an accuracy similar to the sleep experts. The use of DH paves the way for longitudinal monitoring of patients with a suspicion of OSA and its accessibility could lead to better screening of the general population. Support This Study has been supported by Dreem sas.


AI ◽  
2020 ◽  
Vol 1 (4) ◽  
pp. 539-557 ◽  
Author(s):  
Barath Narayanan ◽  
Russell Hardie ◽  
Vignesh Krishnaraja ◽  
Christina Karam ◽  
Venkata Davuluru

The coronavirus disease 2019 (COVID-19) global pandemic has severely impacted lives across the globe. Respiratory disorders in COVID-19 patients are caused by lung opacities similar to viral pneumonia. A Computer-Aided Detection (CAD) system for the detection of COVID-19 using chest radiographs would provide a second opinion for radiologists. For this research, we utilize publicly available datasets that have been marked by radiologists into two-classes (COVID-19 and non-COVID-19). We address the class imbalance problem associated with the training dataset by proposing a novel transfer-to-transfer learning approach, where we break a highly imbalanced training dataset into a group of balanced mini-sets and apply transfer learning between these. We demonstrate the efficacy of the method using well-established deep convolutional neural networks. Our proposed training mechanism is more robust to limited training data and class imbalance. We study the performance of our algorithm(s) based on 10-fold cross validation and two hold-out validation experiments to demonstrate its efficacy. We achieved an overall sensitivity of 0.94 for the hold-out validation experiments containing 2265 and 2139 marked as COVID-19 chest radiographs, respectively. For the 10-fold cross validation experiment, we achieve an overall Area under the Receiver Operating Characteristic curve (AUC) value of 0.996 for COVID-19 detection. This paper serves as a proof-of-concept that an automated detection approach can be developed with a limited set of COVID-19 images, and in areas with scarcity of trained radiologists.


2021 ◽  
Vol 11 ◽  
Author(s):  
Feng Teng ◽  
Wenjun Fan ◽  
Yanrong Luo ◽  
Shouping Xu ◽  
Hanshun Gong ◽  
...  

ObjectiveThis study aimed to develop a least absolute shrinkage and selection operator (LASSO)-based multivariable normal tissue complication probability (NTCP) model to predict radiation-induced xerostomia in patients with nasopharyngeal carcinoma (NPC) treated with comprehensive salivary gland–sparing helical tomotherapy technique.Methods and MaterialsLASSO with the extended bootstrapping technique was used to build multivariable NTCP models to predict factors of patient-reported xerostomia relieved by 50% and 80% compared with the level at the end of radiation therapy within 1 year and 2 years, R50-1year and R80-2years, in 203 patients with NPC. The model assessment was based on 10-fold cross-validation and the area under the receiver operating characteristic curve (AUC).ResultsThe prediction model by LASSO with 10-fold cross-validation showed that radiation-induced xerostomia recovery could be predicted by prognostic factors of R50-1year (age, gender, T stage, UICC/AJCC stage, parotid Dmean, oral cavity Dmean, and treatment options) and R80-2years (age, gender, T stage, UICC/AJCC stage, oral cavity Dmean, N stage, and treatment options). These prediction models also demonstrated a good performance by the AUC.ConclusionThe prediction models of R50-1year and R80-2years by LASSO with 10-fold cross-validation were recommended to validate the NTCP model before comprehensive salivary gland–sparing radiation therapy in patients with NPC.


2022 ◽  
Vol 8 ◽  
Author(s):  
Bin Wang ◽  
Xiong Han ◽  
Zongya Zhao ◽  
Na Wang ◽  
Pan Zhao ◽  
...  

Objective: Antiseizure medicine (ASM) is the first choice for patients with epilepsy. The choice of ASM is determined by the type of epilepsy or epileptic syndrome, which may not be suitable for certain patients. This initial choice of a particular drug affects the long-term prognosis of patients, so it is critical to select the appropriate ASMs based on the individual characteristics of a patient at the early stage of the disease. The purpose of this study is to develop a personalized prediction model to predict the probability of achieving seizure control in patients with focal epilepsy, which will help in providing a more precise initial medication to patients.Methods: Based on response to oxcarbazepine (OXC), enrolled patients were divided into two groups: seizure-free (52 patients), not seizure-free (NSF) (22 patients). We created models to predict patients' response to OXC monotherapy by combining Electroencephalogram (EEG) complexities and 15 clinical features. The prediction models were gradient boosting decision tree-Kolmogorov complexity (GBDT-KC) and gradient boosting decision tree-Lempel-Ziv complexity (GBDT-LZC). We also constructed two additional prediction models, support vector machine-Kolmogorov complexity (SVM-KC) and SVM-LZC, and these two models were compared with the GBDT models. The performance of the models was evaluated by calculating the accuracy, precision, recall, F1-score, sensitivity, specificity, and area under the curve (AUC) of these models.Results: The mean accuracy, precision, recall, F1-score, sensitivity, specificity, AUC of GBDT-LZC model after five-fold cross-validation were 81%, 84%, 91%, 87%, 91%, 64%, 81%, respectively. The average accuracy, precision, recall, F1-score, sensitivity, specificity, AUC of GBDT-KC model with five-fold cross-validation were 82%, 84%, 92%, 88%, 83%, 92%, 83%, respectively. We used the rank of absolute weights to separately calculate the features that have the most significant impact on the classification of the two models.Conclusion: (1) The GBDT-KC model has the potential to be used in the clinic to predict seizure-free with OXC monotherapy. (2). Electroencephalogram complexity, especially Kolmogorov complexity (KC) may be a potential biomarker in predicting the treatment efficacy of OXC in newly diagnosed patients with focal epilepsy.


2021 ◽  
Vol 11 (21) ◽  
pp. 10264
Author(s):  
Haohan Xiao ◽  
Bo Xing ◽  
Yujie Wang ◽  
Peng Yu ◽  
Lipeng Liu ◽  
...  

The shield machine attitude (SMA) is the most important parameter in the process of tunnel construction. To prevent the shield machine from deviating from the design axis (DTA) of the tunnel, it is of great significance to accurately predict the dynamic characteristics of SMA. We establish eight SMA prediction models based on the data of five earth pressure balance (EPB) shield machines. The algorithms adopted in the models are four machine learning (ML) algorithms (KNN, SVR, RF, AdaBoost) and four deep learning (DL) algorithms (BPNN, CNN, LSTM, GRU). This paper obtains the hyperparameters of the models by utilizing grid search and K-fold cross-validation techniques and uses EVS and RMSE to verify and evaluate the prediction performances of the models. The prediction results reveal that the two best algorithms are the LSTM and GRU with EVS > 0.98 and RMSE < 1.5. Then, integrating ML algorithms and DL algorithms, we design a warning predictor for SMA. Through the historical 5-cycle data, the predictor can give a warning in advance if the SMA deviates significantly from DTA. This study indicates that AI technologies have considerable promise in the field of SMA dynamic prediction.


Author(s):  
Jung Soo Nam ◽  
Cho Rok Na ◽  
Hyoung Han Jo ◽  
Jun Yeob Song ◽  
Tae Ho Ha ◽  
...  

This article discusses the development of lens form error prediction models using in-process cavity pressure and temperature signals based on a k-fold cross-validation method. In a series of lens injection moulding experiments, the built-in-sensor mould is used, the in-process cavity pressure and temperature signals are captured and the lens form errors are measured. Then, three features including maximum pressure, holding pressure and maximum temperature are identified from the measured cavity pressure and temperature profiles, and the lens form error prediction models are formulated based on a response surface methodology. In particular, the k-fold cross-validation approach is adopted in order to improve the prediction accuracy. It is demonstrated that the lens form error prediction models can be practically used for diagnosing the quality of injection-moulded lenses in an industrial site.


Author(s):  
Tomislav Hengl ◽  
Madlene Nussbaum ◽  
Marvin N Wright ◽  
Gerard B.M. Heuvelink

Random forest and similar Machine Learning techniques are already used to generate spatial predictions, but spatial location of points (geography) is often ignored in the modeling process. Spatial auto-correlation, especially if still existent in the cross-validation residuals, indicates that the predictions are maybe biased, and this is suboptimal. This paper presents a random forest for spatial predictions framework (RFsp) where buffer distances from observation points are used as explanatory variables, thus incorporating geographical proximity effects into the prediction process. The RFsp framework is illustrated with examples that use textbook datasets and apply spatial and spatio-temporal prediction to numeric, binary, categorical, multivariate and spatiotemporal variables. Performance of the RFsp framework is compared with the state-of-the-art kriging techniques using 5--fold cross-validation with refitting. The results show that RFsp can obtain equally accurate and unbiased predictions as different versions of kriging. Advantages of using RFsp over kriging are that it needs no rigid statistical assumptions about the distribution and stationarity of the target variable, it is more flexible towards incorporating, combining and extending covariates of different types, and it possibly yields more informative maps characterizing the prediction error. RFsp appears to be especially attractive for building multivariate spatial prediction models that can be used as "knowledge engines" in various geoscience fields. Some disadvantages of RFsp are the exponentially growing computational intensity with increase of calibration data and covariates and the high sensitivity of predictions to input data quality. For many data sets, especially those with lower number of points and covariates and close-to-linear relationships, model-based geostatistics can still lead to more accurate predictions than RFsp.


Sign in / Sign up

Export Citation Format

Share Document