Detecting Plasma Detachment in the Wendelstein 7-X Stellarator Using Machine Learning

The detachment regime has a high potential to play an important role in fusion devices on the road to a fusion power plant. Complete power detachment has been observed several times during the experimental campaigns of the Wendelstein 7-X (W7-X) stellarator. Automatic observation and signaling of such events could help scientists to better understand these phenomena. With the growing discharge times in fusion devices, machine learning models and algorithms are a powerful tool to process the increasing amount of data. We investigate several classical supervised machine learning models to detect complete power detachment in the images captured by the Event Detection Intelligent Camera System (EDICAM) at the W7-X at each given image frame. In the dedicated detached state the plasma is stable despite its reduced contact with the machine walls and the radiation belt stays close to the separatrix, without exhibiting significant heat load onto the divertor. To decrease computational time and resources needed we propose certain pixel intensity profiles (or intensity values along lines) as the input to these models. After finding the profile that describes the images best in terms of detachment, we choose the best performing machine learning algorithm. It achieves an F1 score of 0.9836 on the training dataset and 0.9335 on the test set. Furthermore, we investigate its predictions in other scenarios, such as plasmas with substantially decreased minor radius and several magnetic configurations.

Download Full-text

Application of Natural Language Processing with Supervised Machine Learning Techniques to Predict the Overall Drugs Performance

AJIT-e Online Academic Journal of Information Technology ◽

10.5824/ajite.2020.01.001.x ◽

2020 ◽

Vol 11 (40) ◽

pp. 8-23

Author(s):

Pius MARTHIN ◽

Duygu İÇEN

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Semantic Analysis ◽

Classification Tree ◽

Supervised Machine Learning ◽

Training Dataset ◽

Support Vector ◽

Learning Models ◽

Machine Learning Models

Online product reviews have become a valuable source of information which facilitate customer decision with respect to a particular product. With the wealthy information regarding user's satisfaction and experiences about a particular drug, pharmaceutical companies make the use of online drug reviews to improve the quality of their products. Machine learning has enabled scientists to train more efficient models which facilitate decision making in various fields. In this manuscript we applied a drug review dataset used by (Gräβer, Kallumadi, Malberg,& Zaunseder, 2018), available freely from machine learning repository website of the University of California Irvine (UCI) to identify best machine learning model which provide a better prediction of the overall drug performance with respect to users' reviews. Apart from several manipulations done to improve model accuracy, all necessary procedures required for text analysis were followed including text cleaning and transformation of texts to numeric format for easy training machine learning models. Prior to modeling, we obtained overall sentiment scores for the reviews. Customer's reviews were summarized and visualized using a bar plot and word cloud to explore the most frequent terms. Due to scalability issues, we were able to use only the sample of the dataset. We randomly sampled 15000 observations from the 161297 training dataset and 10000 observations were randomly sampled from the 53766 testing dataset. Several machine learning models were trained using 10 folds cross-validation performed under stratified random sampling. The trained models include Classification and Regression Trees (CART), classification tree by C5.0, logistic regression (GLM), Multivariate Adaptive Regression Spline (MARS), Support vector machine (SVM) with both radial and linear kernels and a classification tree using random forest (Random Forest). Model selection was done through a comparison of accuracies and computational efficiency. Support vector machine (SVM) with linear kernel was significantly best with an accuracy of 83% compared to the rest. Using only a small portion of the dataset, we managed to attain reasonable accuracy in our models by applying the TF-IDF transformation and Latent Semantic Analysis (LSA) technique to our TDM.

Download Full-text

Application of Machine Learning Techniques to Predict Binding Affinity for Drug Targets: A Study of Cyclin-Dependent Kinase 2

Current Medicinal Chemistry ◽

10.2174/2213275912666191102162959 ◽

2020 ◽

Vol 28 (2) ◽

pp. 253-265 ◽

Cited By ~ 3

Author(s):

Gabriela Bitencourt-Ferreira ◽

Amauri Duarte da Silva ◽

Walter Filgueira de Azevedo

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Predictive Performance ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Scoring Functions ◽

Cyclin Dependent Kinase ◽

Learning Models ◽

Learning Techniques ◽

Machine Learning Models

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.

Download Full-text

On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn’t

Genes ◽

10.3390/genes12040527 ◽

2021 ◽

Vol 12 (4) ◽

pp. 527

Author(s):

Eran Elhaik ◽

Dan Graur

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

A Priori ◽

Neutral Theory ◽

Dominant Mode ◽

Supervised Machine Learning ◽

Training Dataset ◽

Selective Sweeps ◽

Two Factors ◽

Negative Controls

In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863–1877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366–1371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern’s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S/HIC) should be taken with a huge shovel of salt.

Download Full-text

A study of micromanufacturing process fingerprints in micro-injection moulding for machine learning and Industry 4.0 applications

The International Journal of Advanced Manufacturing Technology ◽

10.1007/s00170-021-07252-7 ◽

2021 ◽

Author(s):

Mert Gülçür ◽

Ben Whiteside

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Multiple Linear Regression ◽

Injection Moulding ◽

Industry 4.0 ◽

Supervised Machine Learning ◽

Process Conditions ◽

Learning Models ◽

Micro Injection Moulding ◽

Machine Learning Models

AbstractThis paper discusses micromanufacturing process quality proxies called “process fingerprints” in micro-injection moulding for establishing in-line quality assurance and machine learning models for Industry 4.0 applications. Process fingerprints that we present in this study are purely physical proxies of the product quality and need tangible rationale regarding their selection criteria such as sensitivity, cost-effectiveness, and robustness. Proposed methods and selection reasons for process fingerprints are also justified by analysing the temporally collected data with respect to the microreplication efficiency. Extracted process fingerprints were also used in a multiple linear regression scenario where they bring actionable insights for creating traceable and cost-effective supervised machine learning models in challenging micro-injection moulding environments. Multiple linear regression model demonstrated %84 accuracy in predicting the quality of the process, which is significant as far as the extreme process conditions and product features are concerned.

Download Full-text

Prediction of Graduate Admission using Multiple Supervised Machine Learning Models

2020 SoutheastCon ◽

10.1109/southeastcon44009.2020.9249747 ◽

2020 ◽

Author(s):

Zain Bitar ◽

Amjed Al-Mousa

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Comparison of the Performance of Machine Learning Algorithms in Predicting Heart Disease

Frontiers in Health Informatics ◽

10.30699/fhi.v10i1.349 ◽

2021 ◽

Vol 10 (1) ◽

pp. 99

Author(s):

Sajad Yousefi

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Disease ◽

Decision Tree ◽

Roc Curve ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Learning Models ◽

Algorithm Performance ◽

Machine Learning Models

Introduction: Heart disease is often associated with conditions such as clogged arteries due to the sediment accumulation which causes chest pain and heart attack. Many people die due to the heart disease annually. Most countries have a shortage of cardiovascular specialists and thus, a significant percentage of misdiagnosis occurs. Hence, predicting this disease is a serious issue. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several algorithms were utilized to predict heart disease among which Decision Tree, Random Forest and KNN supervised machine learning are highly mentioned. The algorithms are applied to the dataset taken from the UCI repository including 294 samples. The dataset includes heart disease features. To enhance the algorithm performance, these features are analyzed, the feature importance scores and cross validation are considered.Results: The algorithm performance is compared with each other, so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, Accuracy, AUC ROC are 83% and 99% respectively for Decision Tree algorithm. Logistic Regression algorithm with accuracy and AUC ROC are 88% and 91% respectively has better performance than other algorithms. Therefore, these techniques can be useful for physicians to predict heart disease patients and prescribe them correctly.Conclusion: Machine learning technique can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the prediction of heart disease is compared to determine the most appropriate classification. As a result of evaluation, better performance was observed in both Decision Tree and Logistic Regression models.

Download Full-text

Uncovering and Correcting Shortcut Learning in Machine Learning Models for Skin Cancer Diagnosis

Diagnostics ◽

10.3390/diagnostics12010040 ◽

2021 ◽

Vol 12 (1) ◽

pp. 40

Author(s):

Meike Nauta ◽

Ricky Walsh ◽

Adam Dubowski ◽

Christin Seifert

Keyword(s):

Machine Learning ◽

Clinical Practice ◽

Skin Cancer ◽

Cancer Diagnosis ◽

Image Inpainting ◽

Relevant Information ◽

Black Box ◽

Training Dataset ◽

Learning Models ◽

Machine Learning Models

Machine learning models have been successfully applied for analysis of skin images. However, due to the black box nature of such deep learning models, it is difficult to understand their underlying reasoning. This prevents a human from validating whether the model is right for the right reasons. Spurious correlations and other biases in data can cause a model to base its predictions on such artefacts rather than on the true relevant information. These learned shortcuts can in turn cause incorrect performance estimates and can result in unexpected outcomes when the model is applied in clinical practice. This study presents a method to detect and quantify this shortcut learning in trained classifiers for skin cancer diagnosis, since it is known that dermoscopy images can contain artefacts. Specifically, we train a standard VGG16-based skin cancer classifier on the public ISIC dataset, for which colour calibration charts (elliptical, coloured patches) occur only in benign images and not in malignant ones. Our methodology artificially inserts those patches and uses inpainting to automatically remove patches from images to assess the changes in predictions. We find that our standard classifier partly bases its predictions of benign images on the presence of such a coloured patch. More importantly, by artificially inserting coloured patches into malignant images, we show that shortcut learning results in a significant increase in misdiagnoses, making the classifier unreliable when used in clinical practice. With our results, we, therefore, want to increase awareness of the risks of using black box machine learning models trained on potentially biased datasets. Finally, we present a model-agnostic method to neutralise shortcut learning by removing the bias in the training dataset by exchanging coloured patches with benign skin tissue using image inpainting and re-training the classifier on this de-biased dataset.

Download Full-text

A machine learning approach to inform developmental milestone achievement for children with autism (Preprint)

10.2196/preprints.29242 ◽

2021 ◽

Author(s):

Munirul M. Haque ◽

Masud Rabbani ◽

Dipranjan Das Dipal ◽

Md Ishrak Islam Zarif ◽

Anik Iqbal ◽

...

Keyword(s):

Machine Learning ◽

Autism Spectrum Disorder ◽

Children With Autism ◽

Autism Spectrum ◽

The Other ◽

Supervised Machine Learning ◽

Learning Models ◽

Children With Asd ◽

Socio Demographic Factors ◽

Machine Learning Models

BACKGROUND Care for children with autism spectrum disorder (ASD) can be challenging for families and medical care systems. This is especially true in Low-and-Middle-Income-countries (LMIC) like Bangladesh. To improve family-practitioner communication and developmental monitoring of children with ASD, [spell out] (mCARE) was developed. Within this study, mCARE was used to track child milestone achievement and family socio-demographic assets to inform mCARE feasibility/scalability and family-asset informed practitioner recommendations. OBJECTIVE The objectives of this paper are three-fold. First, document how mCARE can be used to monitor child milestone achievement. Second, demonstrate how advanced machine learning models can inform our understanding of milestone achievement in children with ASD. Third, describe family/child socio-demographic factors that are associated with earlier milestone achievement in children with ASD (across five machine learning models). METHODS Using mCARE collected data, this study assessed milestone achievement in 300 children with ASD from Bangladesh. In this study, we used four supervised machine learning (ML) algorithms (Decision Tree, Logistic Regression, k-Nearest Neighbors, Artificial Neural Network) and one unsupervised machine learning (K-means Clustering) to build models of milestone achievement based on family/child socio-demographic details. For analyses, the sample was randomly divided in half to train the ML models and then their accuracy was estimated based on the other half of the sample. Each model was specified for the following milestones: Brushes teeth, Asks to use the toilet, Urinates in the toilet or potty, and Buttons large buttons. RESULTS This study aimed to find a suitable machine learning algorithm for milestone prediction/achievement for children with ASD using family/child socio-demographic characteristics. For, Brushes teeth, the three supervised machine learning models met or exceeded an accuracy of 95% with Logistic Regression, KNN, and ANN as the most robust socio-demographic predictors. For Asks to use toilet, 84.00% accuracy was achieved with the KNN and ANN models. For these models, the family socio-demographic predictors of “family expenditure” and “parents’ age” accounted for most of the model variability. The last two parameters, Urinates in toilet or potty and Buttons large buttons had an accuracy of 91.00% and 76.00%, respectively, in ANN. Overall, the ANN had a higher accuracy (Above ~80% on average) among the other algorithms for all the parameters. Across the models and milestones, “family expenditure”, “family size/ type”, “living places” and “parent’s age and occupation” were the most influential family/child socio-demographic factors. CONCLUSIONS mCARE was successfully deployed in an LMIC (i.e., Bangladesh), allowing parents and care-practitioners a mechanism to share detailed information on child milestones achievement. Using advanced modeling techniques this study demonstrates how family/child socio-demographic elements can inform child milestone achievement. Specifically, families with fewer socio-demographic resources reported later milestone attainment. Developmental science theories highlight how family/systems can directly influence child development and this study provides a clear link between family resources and child developmental progress. Clinical implications for this work could include supporting the larger family system to improve child milestone achievement. CLINICALTRIAL We took the IRB from Marquette University Institutional Review Board on July 9, 2020, with the protocol number HR-1803022959, and titled “MOBILE-BASED CARE FOR CHILDREN WITH AUTISM SPECTRUM DISORDER USING REMOTE EXPERIENCE SAMPLING METHOD (MCARE)” for recruiting a total of 316 subjects, of which we recruited 300. (Details description of participants in Methods section)

Download Full-text

Using Supervised Machine Learning Algorithms for Automated Lithology Prediction from Wireline Log Data

10.2118/208559-ms ◽

2021 ◽

Author(s):

Marian Popescu ◽

Rebecca Head ◽

Tim Ferriday ◽

Kate Evans ◽

Jose Montero ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Training Dataset ◽

Depth Interval ◽

Log Data ◽

Machine Learning Approach ◽

Lithology Prediction ◽

Logging While Drilling

Abstract This paper presents advancements in machine learning and cloud deployment that enable rapid and accurate automated lithology interpretation. A supervised machine learning technique is described that enables rapid, consistent, and accurate lithology prediction alongside quantitative uncertainty from large wireline or logging-while-drilling (LWD) datasets. To leverage supervised machine learning, a team of geoscientists and petrophysicists made detailed lithology interpretations of wells to generate a comprehensive training dataset. Lithology interpretations were based on applying determinist cross-plotting by utilizing and combining various raw logs. This training dataset was used to develop a model and test a machine learning pipeline. The pipeline was applied to a dataset previously unseen by the algorithm, to predict lithology. A quality checking process was performed by a petrophysicist to validate new predictions delivered by the pipeline against human interpretations. Confidence in the interpretations was assessed in two ways. The prior probability was calculated, a measure of confidence in the input data being recognized by the model. Posterior probability was calculated, which quantifies the likelihood that a specified depth interval comprises a given lithology. The supervised machine learning algorithm ensured that the wells were interpreted consistently by removing interpreter biases and inconsistencies. The scalability of cloud computing enabled a large log dataset to be interpreted rapidly; >100 wells were interpreted consistently in five minutes, yielding >70% lithological match to the human petrophysical interpretation. Supervised machine learning methods have strong potential for classifying lithology from log data because: 1) they can automatically define complex, non-parametric, multi-variate relationships across several input logs; and 2) they allow classifications to be quantified confidently. Furthermore, this approach captured the knowledge and nuances of an interpreter's decisions by training the algorithm using human-interpreted labels. In the hydrocarbon industry, the quantity of generated data is predicted to increase by >300% between 2018 and 2023 (IDC, Worldwide Global DataSphere Forecast, 2019–2023). Additionally, the industry holds vast legacy data. This supervised machine learning approach can unlock the potential of some of these datasets by providing consistent lithology interpretations rapidly, allowing resources to be used more effectively.

Download Full-text

Classification and Success Investigation of Biomedical Data Sets Using Supervised Machine Learning Models

2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) ◽

10.1109/ismsit.2019.8932734 ◽

2019 ◽

Author(s):

Sarmad N. Mohammed ◽

Mehmet Serdar Guzel ◽

Erkan Bostanci

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Data Sets ◽

Biomedical Data ◽

Learning Models ◽

Machine Learning Models

Download Full-text