Alignment-free machine learning approaches for the lethality prediction of potential novel human-adapted coronavirus using genomic nucleotide

AbstractA newly emerging novel coronavirus appeared and rapidly spread worldwide and World Health Organization declared a pandemic on March 11, 2020. The roles and characteristics of coronavirus have captured much attention due to its power of causing a wide variety of infectious diseases, from mild to severe on humans. The detection of the lethality of human coronavirus is key to estimate the viral toxicity and provide perspective for treatment. We developed alignment-free machine learning approaches for an ultra-fast and highly accurate prediction of the lethality of potential human-adapted coronavirus using genomic nucleotide. We performed extensive experiments through six different feature transformation and machine learning algorithms in combination with digital signal processing to infer the lethality of possible future novel coronaviruses using previous existing strains. The results tested on SARS-CoV, MERS-Cov and SARS-CoV-2 datasets show an average 96.7% prediction accuracy. We also provide preliminary analysis validating the effectiveness of our models through other human coronaviruses. Our study achieves high levels of prediction performance based on raw RNA sequences alone without genome annotations and specialized biological knowledge. The results demonstrate that, for any novel human coronavirus strains, this alignment-free machine learning-based approach can offer a reliable real-time estimation for its viral lethality.

Download Full-text

Exploring the Lethality of Human-Adapted Coronavirus through Alignment-Free Machine Learning Approaches Using Genomic Sequences

Current Genomics ◽

10.2174/1389202923666211221110857 ◽

2021 ◽

Vol 23 ◽

Author(s):

Rui Yin ◽

Zihan Luo ◽

Chee Keong Kwoh

Keyword(s):

Machine Learning ◽

Time Estimation ◽

Digital Signal ◽

Machine Learning Algorithms ◽

World Health ◽

Genomic Sequences ◽

Biological Knowledge ◽

Learning Approaches ◽

Human Coronavirus ◽

Alignment Free

Background: A newly emerging novel coronavirus appeared and rapidly spread worldwide and World Health Organization declared a pandemic on March 11, 2020. The roles and characteristics of coronavirus have captured much attention due to its power of causing a wide variety of infectious diseases, from mild to severe, on humans. The detection of the lethality of human coronavirus is key to estimate the viral toxicity and provide perspectives for treatment. Methods: We developed an alignment-free framework that utilizes machine learning approaches for an ultra-fast and highly accurate prediction of the lethality of human-adapted coronavirus using genomic sequences. We performed extensive experiments through six different feature transformation and machine learning algorithms combining digital signal processing to identify the lethality of possible future novel coronaviruses using existing strains. Results: The results tested on SARS-CoV, MERS-CoV and SARS-CoV-2 datasets show an average 96.7% prediction accuracy. We also provide preliminary analysis validating the effectiveness of our models through other human coronaviruses. Our framework achieves high levels of prediction performance that is alignment-free and based on RNA sequences alone without genome annotations and specialized biological knowledge. Conclusion: The results demonstrate that, for any novel human coronavirus strains, this study can offer a reliable real-time estimation for its viral lethality.

Download Full-text

Stroke Disease Detection and Prediction Using Robust Learning Approaches

Journal of Healthcare Engineering ◽

10.1155/2021/7633381 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Tahia Tazin ◽

Md Nur Alam ◽

Nahian Nakiba Dola ◽

Mohammad Sajibul Bari ◽

Sami Bourouis ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Early Recognition ◽

Warning Signs ◽

Machine Learning Algorithms ◽

World Health ◽

Learning Approaches ◽

Numerous Model ◽

Health Organization ◽

The Brain

Stroke is a medical disorder in which the blood arteries in the brain are ruptured, causing damage to the brain. When the supply of blood and other nutrients to the brain is interrupted, symptoms might develop. According to the World Health Organization (WHO), stroke is the greatest cause of death and disability globally. Early recognition of the various warning signs of a stroke can help reduce the severity of the stroke. Different machine learning (ML) models have been developed to predict the likelihood of a stroke occurring in the brain. This research uses a range of physiological parameters and machine learning algorithms, such as Logistic Regression (LR), Decision Tree (DT) Classification, Random Forest (RF) Classification, and Voting Classifier, to train four different models for reliable prediction. Random Forest was the best performing algorithm for this task with an accuracy of approximately 96 percent. The dataset used in the development of the method was the open-access Stroke Prediction dataset. The accuracy percentage of the models used in this investigation is significantly higher than that of previous studies, indicating that the models used in this investigation are more reliable. Numerous model comparisons have established their robustness, and the scheme can be deduced from the study analysis.

Download Full-text

Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Current Pharmaceutical Design ◽

10.2174/1381612825666191107092214 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4296-4302 ◽

Cited By ~ 2

Author(s):

Yuan Zhang ◽

Zhenyan Han ◽

Qian Gao ◽

Xiaoyi Bai ◽

Chi Zhang ◽

...

Keyword(s):

Machine Learning ◽

Inclusion Bodies ◽

Cross Validation ◽

Independent Set ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Validation Test ◽

Excess Number ◽

Fold Cross Validation

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Download Full-text

A State of Art Techniques on Machine Learning Algorithms: A Perspective of Supervised Learning Approaches in Data Classification

2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS) ◽

10.1109/iccons.2018.8663155 ◽

2018 ◽

Cited By ~ 15

Author(s):

R. Saravanan ◽

Pothula Sujatha

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Learning Algorithms ◽

Data Classification ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

State Of Art ◽

Art Techniques

Download Full-text

Searching for improvements in predicting human eye colour from DNA

International Journal of Legal Medicine ◽

10.1007/s00414-021-02645-5 ◽

2021 ◽

Author(s):

Magdalena Kukla-Bartoszek ◽

Paweł Teisseyre ◽

Ewelina Pośpiech ◽

Joanna Karłowska-Pik ◽

Piotr Zieliński ◽

...

Keyword(s):

Machine Learning ◽

Regression Models ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Sequencing Analysis ◽

Learning Approaches ◽

Human Eye ◽

Software Analysis ◽

Whole Exome ◽

Eye Colour

AbstractIncreasing understanding of human genome variability allows for better use of the predictive potential of DNA. An obvious direct application is the prediction of the physical phenotypes. Significant success has been achieved, especially in predicting pigmentation characteristics, but the inference of some phenotypes is still challenging. In search of further improvements in predicting human eye colour, we conducted whole-exome (enriched in regulome) sequencing of 150 Polish samples to discover new markers. For this, we adopted quantitative characterization of eye colour phenotypes using high-resolution photographic images of the iris in combination with DIAT software analysis. An independent set of 849 samples was used for subsequent predictive modelling. Newly identified candidates and 114 additional literature-based selected SNPs, previously associated with pigmentation, and advanced machine learning algorithms were used. Whole-exome sequencing analysis found 27 previously unreported candidate SNP markers for eye colour. The highest overall prediction accuracies were achieved with LASSO-regularized and BIC-based selected regression models. A new candidate variant, rs2253104, located in the ARFIP2 gene and identified with the HyperLasso method, revealed predictive potential and was included in the best-performing regression models. Advanced machine learning approaches showed a significant increase in sensitivity of intermediate eye colour prediction (up to 39%) compared to 0% obtained for the original IrisPlex model. We identified a new potential predictor of eye colour and evaluated several widely used advanced machine learning algorithms in predictive analysis of this trait. Our results provide useful hints for developing future predictive models for eye colour in forensic and anthropological studies.

Download Full-text

Attack and Anomaly Detection in IoT Networks Using Supervised Machine Learning Approaches

Revue d intelligence artificielle ◽

10.18280/ria.350102 ◽

2021 ◽

Vol 35 (1) ◽

pp. 11-21

Author(s):

Himani Tyagi ◽

Rajendra Kumar

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Detection System ◽

Feature Reduction ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Testing Time ◽

Learning Approaches ◽

Reduction Techniques ◽

Share Data

IoT is characterized by communication between things (devices) that constantly share data, analyze, and make decisions while connected to the internet. This interconnected architecture is attracting cyber criminals to expose the IoT system to failure. Therefore, it becomes imperative to develop a system that can accurately and automatically detect anomalies and attacks occurring in IoT networks. Therefore, in this paper, an Intrsuion Detection System (IDS) based on extracted novel feature set synthesizing BoT-IoT dataset is developed that can swiftly, accurately and automatically differentiate benign and malicious traffic. Instead of using available feature reduction techniques like PCA that can change the core meaning of variables, a unique feature set consisting of only seven lightweight features is developed that is also IoT specific and attack traffic independent. Also, the results shown in the study demonstrates the effectiveness of fabricated seven features in detecting four wide variety of attacks namely DDoS, DoS, Reconnaissance, and Information Theft. Furthermore, this study also proves the applicability and efficiency of supervised machine learning algorithms (KNN, LR, SVM, MLP, DT, RF) in IoT security. The performance of the proposed system is validated using performance Metrics like accuracy, precision, recall, F-Score and ROC. Though the accuracy of Decision Tree (99.9%) and Randon Forest (99.9%) Classifiers are same but other metrics like training and testing time shows Random Forest comparatively better.

Download Full-text

Detection of Diabetic Patterns using Supervised Learning

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3473.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 1169-1173

Keyword(s):

Machine Learning ◽

Predictive Models ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

World Health ◽

Performance Parameters ◽

Major Health ◽

Major Increase ◽

Boosted Tree

World Health Organization’s (WHO) report 2018, on diabetes has reported that the number of diabetic cases has increased from one hundred eight million to four hundred twenty-two million from the year 1980. The fact sheet shows that there is a major increase in diabetic cases from 4.7% to 8.5% among adults (18 years of age). Major health hazards caused due to diabetes include kidney function failure, heart disease, blindness, stroke, and lower limb dismembering. This article applies supervised machine learning algorithms on the Pima Indian Diabetic dataset to explore various patterns of risks involved using predictive models. Predictive model construction is based upon supervised machine learning algorithms: Naïve Bayes, Decision Tree, Random Forest, Gradient Boosted Tree, and Tree Ensemble. Further, the analytical patterns about these predictive models have been presented based on various performance parameters which include accuracy, precision, recall, and F-measure.

Download Full-text

Analysis of Residual Current Flows in Inverter Based Energy Systems Using Machine Learning Approaches

Energies ◽

10.3390/en15020582 ◽

2022 ◽

Vol 15 (2) ◽

pp. 582

Author(s):

Holger Behrends ◽

Dietmar Millinger ◽

Werner Weihs-Sedivy ◽

Anže Javornik ◽

Gerold Roolfs ◽

...

Keyword(s):

Machine Learning ◽

Early Stage ◽

Residual Current ◽

Operating Conditions ◽

Machine Learning Algorithms ◽

Photovoltaic System ◽

Detection Methods ◽

Learning Approaches ◽

Residual Currents ◽

Current Flows

Faults and unintended conditions in grid-connected photovoltaic systems often cause a change of the residual current. This article describes a novel machine learning based approach to detecting anomalies in the residual current of a photovoltaic system. It can be used to detect faults or critical states at an early stage and extends conventional threshold-based detection methods. For this study, a power-hardware-in-the-loop approach was carried out, in which typical faults have been injected under ideal and realistic operating conditions. The investigation shows that faults in a photovoltaic converter system cause a unique behaviour of the residual current and fault patterns can be detected and identified by using pattern recognition and variational autoencoder machine learning algorithms. In this context, it was found that the residual current is not only affected by malfunctions of the system, but also by volatile external influences. One of the main challenges here is to separate the regular residual currents caused by the interferences from those caused by faults. Compared to conventional methods, which respond to absolute changes in residual current, the two machine learning models detect faults that do not affect the absolute value of the residual current.

Download Full-text

Predicting Student’s Performance Using Machine Learning Algorithm

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1209 ◽

2021 ◽

pp. 53-58

Author(s):

Sheela Rani P ◽

Dhivya S ◽

Dharshini Priya M ◽

Dharmila Chowdary A

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Prediction Model ◽

Naive Bayes ◽

Learning Algorithm ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbors

Machine learning is a new analysis discipline that uses knowledge to boost learning, optimizing the training method and developing the atmosphere within which learning happens. There square measure 2 sorts of machine learning approaches like supervised and unsupervised approach that square measure accustomed extract the knowledge that helps the decision-makers in future to require correct intervention. This paper introduces an issue that influences students' tutorial performance prediction model that uses a supervised variety of machine learning algorithms like support vector machine , KNN(k-nearest neighbors), Naïve Bayes and supplying regression and logistic regression. The results supported by various algorithms are compared and it is shown that the support vector machine and Naïve Bayes performs well by achieving improved accuracy as compared to other algorithms. The final prediction model during this paper may have fairly high prediction accuracy .The objective is not just to predict future performance of students but also provide the best technique for finding the most impactful features that influence student’s while studying.

Download Full-text

A Comparison on Supervised and Semi-Supervised Machine Learning Classifiers for Gestational Diabetes Prediction

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39434 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1001-1005

Author(s):

Lokesh Kola

Keyword(s):

Machine Learning ◽

Gestational Diabetes ◽

Supervised Learning ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

World Health ◽

Middle Income ◽

Diabetes Prediction ◽

Supervised Machine Learning Classifiers ◽

Health Organization

Abstract: Diabetes is the deadliest chronic diseases in the world. According to World Health Organization (WHO) around 422 million people are currently suffering from diabetes, particularly in low and middle-income countries. Also, the number of deaths due to diabetes is close to 1.6 million. Recent research has proven that the occurrence of diabetes is likely to be seen in people aged between 18 and this has risen from 4.7 to 8.5% from 1980 to 2014. Early diagnosis is necessary so that the disease does not go into advanced stages which is quite difficult to cure. Significant research has been performed in diabetes predictions. As time passes, challenges keep increasing to build a system to detect diabetes systematically. The hype for Machine Learning is increasing day to day to analyse medical data to diagnose a disease. Previous research has focused on just identifying the diabetes without specifying its type. In this paper, we have we have predicted gestational diabetes (Type-3) by comparing various supervised and semi-supervised machine learning algorithms on two datasets i.e., binned and non-binned datasets and compared the performance based on evaluation metrics. Keywords: Gestational diabetes, Machine Learning, Supervised Learning, Semi-Supervised Learning, Diabetes Prediction

Download Full-text