Identification of abnormal tribological regimes using a microphone and semi-supervised machine-learning algorithm

Friction ◽

10.1007/s40544-021-0518-0 ◽

2021 ◽

Author(s):

Vigneashwara Pandiyan ◽

Josef Prost ◽

Georg Vorlaufer ◽

Markus Varga ◽

Kilian Wasmer

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Industrial Applications ◽

Supervised Machine Learning ◽

Normal Operation ◽

Promising Alternative ◽

Wear And Tear ◽

Validation Procedure ◽

And Performance

AbstractFunctional surfaces in relative contact and motion are prone to wear and tear, resulting in loss of efficiency and performance of the workpieces/machines. Wear occurs in the form of adhesion, abrasion, scuffing, galling, and scoring between contacts. However, the rate of the wear phenomenon depends primarily on the physical properties and the surrounding environment. Monitoring the integrity of surfaces by offline inspections leads to significant wasted machine time. A potential alternate option to offline inspection currently practiced in industries is the analysis of sensors signatures capable of capturing the wear state and correlating it with the wear phenomenon, followed by in situ classification using a state-of-the-art machine learning (ML) algorithm. Though this technique is better than offline inspection, it possesses inherent disadvantages for training the ML models. Ideally, supervised training of ML models requires the datasets considered for the classification to be of equal weightage to avoid biasing. The collection of such a dataset is very cumbersome and expensive in practice, as in real industrial applications, the malfunction period is minimal compared to normal operation. Furthermore, classification models would not classify new wear phenomena from the normal regime if they are unfamiliar. As a promising alternative, in this work, we propose a methodology able to differentiate the abnormal regimes, i.e., wear phenomenon regimes, from the normal regime. This is carried out by familiarizing the ML algorithms only with the distribution of the acoustic emission (AE) signals captured using a microphone related to the normal regime. As a result, the ML algorithms would be able to detect whether some overlaps exist with the learnt distributions when a new, unseen signal arrives. To achieve this goal, a generative convolutional neural network (CNN) architecture based on variational auto encoder (VAE) is built and trained. During the validation procedure of the proposed CNN architectures, we were capable of identifying acoustics signals corresponding to the normal and abnormal wear regime with an accuracy of 97% and 80%. Hence, our approach shows very promising results for in situ and real-time condition monitoring or even wear prediction in tribological applications.

Download Full-text

MALGRA: Machine Learning and N-Gram Malware Feature Extraction and Detection System

Electronics ◽

10.3390/electronics9111777 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1777

Author(s):

Muhammad Ali ◽

Stavros Shiaeles ◽

Gueltoum Bendiab ◽

Bogdan Ghita

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Detection System ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Detection Methods ◽

Normal Operation ◽

Analysis Technique ◽

N Gram

Detection and mitigation of modern malware are critical for the normal operation of an organisation. Traditional defence mechanisms are becoming increasingly ineffective due to the techniques used by attackers such as code obfuscation, metamorphism, and polymorphism, which strengthen the resilience of malware. In this context, the development of adaptive, more effective malware detection methods has been identified as an urgent requirement for protecting the IT infrastructure against such threats, and for ensuring security. In this paper, we investigate an alternative method for malware detection that is based on N-grams and machine learning. We use a dynamic analysis technique to extract an Indicator of Compromise (IOC) for malicious files, which are represented using N-grams. The paper also proposes TF-IDF as a novel alternative used to identify the most significant N-grams features for training a machine learning algorithm. Finally, the paper evaluates the proposed technique using various supervised machine-learning algorithms. The results show that Logistic Regression, with a score of 98.4%, provides the best classification accuracy when compared to the other classifiers used.

Download Full-text

PV Fault Detection Using Positive Unlabeled Learning

Applied Sciences ◽

10.3390/app11125599 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5599

Author(s):

Kristen Jaskie ◽

Joshua Martin ◽

Andreas Spanias

Keyword(s):

Machine Learning ◽

Fault Detection ◽

Learning Algorithm ◽

Power Reduction ◽

Supervised Machine Learning ◽

Solar Array ◽

Model Accuracy ◽

Fire Hazards ◽

Detection Model ◽

And Performance

Solar array management and photovoltaic (PV) fault detection is critical for optimal and robust performance of solar plants. PV faults cause substantial power reduction along with health and fire hazards. Traditional machine learning solutions require large, labeled datasets which are often expensive and/or difficult to obtain. This data can be location and sensor specific, noisy, and resource intensive. In this paper, we develop and demonstrate new semi supervised solutions for PV fault detection. More specifically, we demonstrate that a little-known area of semi-supervised machine learning called positive unlabeled learning can effectively learn solar fault detection models using only a fraction of the labeled data required by traditional techniques. We further introduce a new feedback enhanced positive unlabeled learning algorithm that can increase model accuracy and performance in situations such as solar fault detection when few sensor features are available. Using these algorithms, we create a positive unlabeled solar fault detection model that can match and even exceed the performance of a fully supervised fault classifier using only 5% of the total labeled data.

Download Full-text

A Reckoning Analysis and Assessment of Different Supervised Machine Learning Algorithm for Breast Cancer Prediction

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i3.8388 ◽

2019 ◽

Vol 7 (3) ◽

pp. 83-88

Author(s):

Pragati Prakash ◽

Nidhi Ekka ◽

Manjit Jaiswal

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Learning Algorithm ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Cancer Prediction

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text

Snow Depth Fusion Based on Machine Learning Methods for the Northern Hemisphere

Remote Sensing ◽

10.3390/rs13071250 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1250

Author(s):

Yanxing Hu ◽

Tao Che ◽

Liyun Dai ◽

Lin Xiao

Keyword(s):

Machine Learning ◽

Northern Hemisphere ◽

Snow Depth ◽

Learning Algorithm ◽

Random Forest Regression ◽

Machine Learning Methods ◽

Long Time ◽

In Situ Observations ◽

Input Variables

In this study, a machine learning algorithm was introduced to fuse gridded snow depth datasets. The input variables of the machine learning method included geolocation (latitude and longitude), topographic data (elevation), gridded snow depth datasets and in situ observations. A total of 29,565 in situ observations were used to train and optimize the machine learning algorithm. A total of five gridded snow depth datasets—Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) snow depth, Global Snow Monitoring for Climate Research (GlobSnow) snow depth, Long time series of daily snow depth over the Northern Hemisphere (NHSD) snow depth, ERA-Interim snow depth and Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2) snow depth—were used as input variables. The first three snow depth datasets are retrieved from passive microwave brightness temperature or assimilation with in situ observations, while the last two are snow depth datasets obtained from meteorological reanalysis data with a land surface model and data assimilation system. Then, three machine learning methods, i.e., Artificial Neural Networks (ANN), Support Vector Regression (SVR), and Random Forest Regression (RFR), were used to produce a fused snow depth dataset from 2002 to 2004. The RFR model performed best and was thus used to produce a new snow depth product from the fusion of the five snow depth datasets and auxiliary data over the Northern Hemisphere from 2002 to 2011. The fused snow-depth product was verified at five well-known snow observation sites. The R2 of Sodankylä, Old Aspen, and Reynolds Mountains East were 0.88, 0.69, and 0.63, respectively. At the Swamp Angel Study Plot and Weissfluhjoch observation sites, which have an average snow depth exceeding 200 cm, the fused snow depth did not perform well. The spatial patterns of the average snow depth were analyzed seasonally, and the average snow depths of autumn, winter, and spring were 5.7, 25.8, and 21.5 cm, respectively. In the future, random forest regression will be used to produce a long time series of a fused snow depth dataset over the Northern Hemisphere or other specific regions.

Download Full-text

On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn’t

Genes ◽

10.3390/genes12040527 ◽

2021 ◽

Vol 12 (4) ◽

pp. 527

Author(s):

Eran Elhaik ◽

Dan Graur

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

A Priori ◽

Neutral Theory ◽

Dominant Mode ◽

Supervised Machine Learning ◽

Training Dataset ◽

Selective Sweeps ◽

Two Factors ◽

Negative Controls

In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863–1877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366–1371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern’s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S/HIC) should be taken with a huge shovel of salt.

Download Full-text

Analysis of Gender Identification in Bahasa Indonesia using Supervised Machine Learning Algorithm

2020 3rd International Conference on Information and Communications Technology (ICOIACT) ◽

10.1109/icoiact50329.2020.9332145 ◽

2020 ◽

Author(s):

Evawaty Tanuar ◽

Edi Abdurachman ◽

Ford Lumban Gaol ◽

Lukas

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Gender Identification ◽

Bahasa Indonesia

Download Full-text

Clinical Score and Machine Learning-Based Model to Predict Diagnosis of Primary Aldosteronism in Arterial Hypertension

Hypertension ◽

10.1161/hypertensionaha.121.17444 ◽

2021 ◽

Vol 78 (5) ◽

pp. 1595-1604

Author(s):

Fabrizio Buffolo ◽

Jacopo Burrello ◽

Alessio Burrello ◽

Daniel Heinrich ◽

Christian Adolf ◽

...

Keyword(s):

Machine Learning ◽

Arterial Hypertension ◽

Primary Aldosteronism ◽

Learning Algorithm ◽

Area Under The Curve ◽

Clinical Score ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Individual Risk ◽

The Individual

Primary aldosteronism (PA) is the cause of arterial hypertension in 4% to 6% of patients, and 30% of patients with PA are affected by unilateral and surgically curable forms. Current guidelines recommend screening for PA ≈50% of patients with hypertension on the basis of individual factors, while some experts suggest screening all patients with hypertension. To define the risk of PA and tailor the diagnostic workup to the individual risk of each patient, we developed a conventional scoring system and supervised machine learning algorithms using a retrospective cohort of 4059 patients with hypertension. On the basis of 6 widely available parameters, we developed a numerical score and 308 machine learning-based models, selecting the one with the highest diagnostic performance. After validation, we obtained high predictive performance with our score (optimized sensitivity of 90.7% for PA and 92.3% for unilateral PA [UPA]). The machine learning-based model provided the highest performance, with an area under the curve of 0.834 for PA and 0.905 for diagnosis of UPA, with optimized sensitivity of 96.6% for PA, and 100.0% for UPA, at validation. The application of the predicting tools allowed the identification of a subgroup of patients with very low risk of PA (0.6% for both models) and null probability of having UPA. In conclusion, this score and the machine learning algorithm can accurately predict the individual pretest probability of PA in patients with hypertension and circumvent screening in up to 32.7% of patients using a machine learning-based model, without omitting patients with surgically curable UPA.

Download Full-text

A machine learning approach to predicting short-term mortality risk in patients starting chemotherapy

10.1101/204081 ◽

2017 ◽

Cited By ~ 2

Author(s):

Aymen A. Elfiky ◽

Maximilian J. Pany ◽

Ravi B. Parikh ◽

Ziad Obermeyer

Keyword(s):

Machine Learning ◽

Mortality Risk ◽

Palliative Chemotherapy ◽

Learning Algorithm ◽

Cancer Center ◽

Short Term ◽

Machine Learning Model ◽

Machine Learning Approach ◽

Short Term Mortality ◽

And Performance

ABSTRACTBackgroundCancer patients who die soon after starting chemotherapy incur costs of treatment without benefits. Accurately predicting mortality risk from chemotherapy is important, but few patient data-driven tools exist. We sought to create and validate a machine learning model predicting mortality for patients starting new chemotherapy.MethodsWe obtained electronic health records for patients treated at a large cancer center (26,946 patients; 51,774 new regimens) over 2004-14, linked to Social Security data for date of death. The model was derived using 2004-11 data, and performance measured on non-overlapping 2012-14 data.Findings30-day mortality from chemotherapy start was 2.1%. Common cancers included breast (21.1%), colorectal (19.3%), and lung (18.0%). Model predictions were accurate for all patients (AUC 0.94). Predictions for patients starting palliative chemotherapy (46.6% of regimens), for whom prognosis is particularly important, remained highly accurate (AUC 0.92). To illustrate model discrimination, we ranked patients initiating palliative chemotherapy by model-predicted mortality risk, and calculated observed mortality by risk decile. 30-day mortality in the highest-risk decile was 22.6%; in the lowest-risk decile, no patients died. Predictions remained accurate across all primary cancers, stages, and chemotherapies—even for clinical trial regimens that first appeared in years after the model was trained (AUC 0.94). The model also performed well for prediction of 180-day mortality (AUC 0.87; mortality 74.8% in the highest risk decile vs. 0.2% in the lowest). Predictions were more accurate than data from randomized trials of individual chemotherapies, or SEER estimates.InterpretationA machine learning algorithm accurately predicted short-term mortality in patients starting chemotherapy using EHR data. Further research is necessary to determine generalizability and the feasibility of applying this algorithm in clinical settings.

Download Full-text

Application of Gaussian Process Regression (GPR) in Gas Hydrate Mitigation

Journal of Advanced Research in Fluid Mechanics and Thermal Sciences ◽

10.37934/arfmts.88.2.2737 ◽

2021 ◽

Vol 88 (2) ◽

pp. 27-37

Author(s):

Sachin Dev Suresh ◽

Ali Qasim ◽

Bhajan Lal ◽

Syed Muhammad Imran ◽

Khor Siak Foo

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

Gas Hydrate ◽

Learning Algorithm ◽

Hydrate Formation ◽

Gaussian Process Regression ◽

Normal Operation ◽

Coefficient Of Determination ◽

Gas Hydrate Formation ◽

Testing Data

The production of oil and natural gas contributes to a significant amount of revenue generation in Malaysia thereby strengthening the country’s economy. The flow assurance industry is faced with impediments during smooth operation of the transmission pipeline in which gas hydrate formation is the most important. It affects the normal operation of the pipeline by plugging it. Under high pressure and low temperature conditions, gas hydrate is a crystalline structure consisting of a network of hydrogen bonds between host molecules of water and guest molecules of the incoming gases. Industry uses different types of chemical inhibitors in pipeline to suppress hydrate formation. To overcome this problem, machine learning algorithm has been introduced as part of risk management strategies. The objective of this paper is to utilize Machine Learning (ML) model which is Gaussian Process Regression (GPR). GPR is a new approach being applied to mitigate the growth of gas hydrate. The input parameters used are concentration and pressure of Carbon Dioxide (CO2) and Methane (CH4) gas hydrates whereas the output parameter is the Average Depression Temperature (ADT). The values for the parameter are taken from available data sets that enable GPR to predict the results accurately in terms of Coefficient of Determination, R2 and Mean Squared Error, MSE. The outcome from the research showed that GPR model provided with highest R2 value for training and testing data of 97.25% and 96.71%, respectively. MSE value for GPR was also found to be lowest for training and testing data of 0.019 and 0.023, respectively.

Download Full-text