Machine Learning Based Toxicity Prediction: From Chemical Structural Description to Transcriptome Analysis

Toxicity prediction is very important to public health. Among its many applications, toxicity prediction is essential to reduce the cost and labor of a drug’s preclinical and clinical trials, because a lot of drug evaluations (cellular, animal, and clinical) can be spared due to the predicted toxicity. In the era of Big Data and artificial intelligence, toxicity prediction can benefit from machine learning, which has been widely used in many fields such as natural language processing, speech recognition, image recognition, computational chemistry, and bioinformatics, with excellent performance. In this article, we review machine learning methods that have been applied to toxicity prediction, including deep learning, random forests, k-nearest neighbors, and support vector machines. We also discuss the input parameter to the machine learning algorithm, especially its shift from chemical structural description only to that combined with human transcriptome data analysis, which can greatly enhance prediction accuracy.

Download Full-text

Predicting Student’s Performance Using Machine Learning Algorithm

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1209 ◽

2021 ◽

pp. 53-58

Author(s):

Sheela Rani P ◽

Dhivya S ◽

Dharshini Priya M ◽

Dharmila Chowdary A

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Prediction Model ◽

Naive Bayes ◽

Learning Algorithm ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbors

Machine learning is a new analysis discipline that uses knowledge to boost learning, optimizing the training method and developing the atmosphere within which learning happens. There square measure 2 sorts of machine learning approaches like supervised and unsupervised approach that square measure accustomed extract the knowledge that helps the decision-makers in future to require correct intervention. This paper introduces an issue that influences students' tutorial performance prediction model that uses a supervised variety of machine learning algorithms like support vector machine , KNN(k-nearest neighbors), Naïve Bayes and supplying regression and logistic regression. The results supported by various algorithms are compared and it is shown that the support vector machine and Naïve Bayes performs well by achieving improved accuracy as compared to other algorithms. The final prediction model during this paper may have fairly high prediction accuracy .The objective is not just to predict future performance of students but also provide the best technique for finding the most impactful features that influence student’s while studying.

Download Full-text

Feature-Based Opinion Mining and Managed Machine Learning with Sentiment Classification Models

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4555.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3992-3998

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Opinion Mining ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Data Intensive ◽

Learning Tasks ◽

Feature Based

Sentiment Analysis is individuals' opinions and feedbacks study towards a substance, which can be items, services, movies, people or events. The opinions are mostly expressed as remarks or reviews. With the social network, gatherings and websites, these reviews rose as a significant factor for the client’s decision to buy anything or not. These days, a vast scalable computing environment provides us with very sophisticated way of carrying out various data-intensive natural language processing (NLP) and machine-learning tasks to examine these reviews. One such example is text classification, a compelling method for predicting the clients' sentiment. In this paper, we attempt to center our work of sentiment analysis on movie review database. We look at the sentiment expression to order the extremity of the movie reviews on a size of 0(highly disliked) to 4(highly preferred) and perform feature extraction and ranking and utilize these features to prepare our multilabel classifier to group the movie review into its right rating. This paper incorporates sentiment analysis utilizing feature-based opinion mining and managed machine learning. The principle center is to decide the extremity of reviews utilizing nouns, verbs, and adjectives as opinion words. In addition, a comparative study on different classification approaches has been performed to determine the most appropriate classifier to suit our concern problem space. In our study, we utilized six distinctive machine learning algorithms – Naïve Bayes, Logistic Regression, SVM (Support Vector Machine), RF (Random Forest) KNN (K nearest neighbors) and SoftMax Regression.

Download Full-text

Detection of Loss Zones while Drilling Using Different Machine Learning Techniques

Journal of Energy Resources Technology ◽

10.1115/1.4051553 ◽

2021 ◽

pp. 1-29

Author(s):

Ahmed Alsaihati ◽

Mahmoud Abughaban ◽

Salaheldin Elkatatny ◽

Abdulazeez Abdulraheem

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Random Forests ◽

Nearest Neighbors ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbors ◽

Learning Techniques ◽

Vector Machines ◽

Testing Set

Abstract Fluid loss into formations is a common operational issue that is frequently encountered when drilling across naturally or induced fractured formations. This could pose significant operational risks, such as well-control, stuck pipe, and wellbore instability, which, in turn, lead to an increase of well time and cost. This research aims to use and evaluate different machine learning techniques, namely: support vector machines, random forests, and K-nearest neighbors in detecting loss circulation occurrences while drilling using solely drilling surface parameters. Actual field data of seven wells, which had suffered partial or severe loss circulation, were used to build predictive models, while Well-8 was used to compare the performance of the developed models. Different performance metrics were used to evaluate the performance of the developed models. Recall, precision, and F1-score measures were used to evaluate the ability of the developed model to detect loss circulation occurrences. The results showed the K-nearest neighbors classifier achieved a high F1-score of 0.912 in detecting loss circulation occurrence in the testing set, while the random forests was the second-best classifier with almost the same F1-score of 0.910. The support vector machines achieved an F1-score of 0.83 in predicting the loss circulation occurrence in the testing set. The K-nearest neighbors outperformed other models in detecting the loss circulation occurrences in Well-8 with an F1-score of 0.80. The main contribution of this research as compared to previous studies is that it identifies losses events based on real-time measurements of the active pit volume.

Download Full-text

Applying Machine Learning for Healthcare: A Case Study on Cervical Pain Assessment with Motion Capture

Applied Sciences ◽

10.3390/app10175942 ◽

2020 ◽

Vol 10 (17) ◽

pp. 5942 ◽

Cited By ~ 2

Author(s):

Juan de la Torre ◽

Javier Marin ◽

Sergio Ilarri ◽

Jose J. Marin

Keyword(s):

Machine Learning ◽

Predictive Models ◽

Gradient Boosting ◽

Support Vector ◽

Cervical Pain ◽

K Nearest Neighbors ◽

Network Algorithms ◽

Vector Machines ◽

Real Scenario

Given the exponential availability of data in health centers and the massive sensorization that is expected, there is an increasing need to manage and analyze these data in an effective way. For this purpose, data mining (DM) and machine learning (ML) techniques would be helpful. However, due to the specific characteristics of the field of healthcare, a suitable DM and ML methodology adapted to these particularities is required. The applied methodology must structure the different stages needed for data-driven healthcare, from the acquisition of raw data to decision-making by clinicians, considering the specific requirements of this field. In this paper, we focus on a case study of cervical assessment, where the goal is to predict the potential presence of cervical pain in patients affected with whiplash diseases, which is important for example in insurance-related investigations. By analyzing in detail this case study in a real scenario, we show how taking care of those particularities enables the generation of reliable predictive models in the field of healthcare. Using a database of 302 samples, we have generated several predictive models, including logistic regression, support vector machines, k-nearest neighbors, gradient boosting, decision trees, random forest, and neural network algorithms. The results show that it is possible to reliably predict the presence of cervical pain (accuracy, precision, and recall above 90%). We expect that the procedure proposed to apply ML techniques in the field of healthcare will help technologists, researchers, and clinicians to create more objective systems that provide support to objectify the diagnosis, improve test treatment efficacy, and save resources.

Download Full-text

A systematic review of the machine learning algorithms for the computational analysis in different domains

International Journal of Advanced Technology and Engineering Exploration ◽

10.19101/ijatee.2020.762057 ◽

2020 ◽

Vol 7 (71) ◽

pp. 147-164

Author(s):

Ravita Chahar ◽

Deepinder Kaur

Keyword(s):

Machine Learning ◽

Computational Analysis ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Second Phase ◽

Paper Machine ◽

K Nearest Neighbors ◽

Systematic Analysis

In this paper machine learning algorithms have been discussed and analyzed. It has been discussed considering computational aspects in different domains. These algorithms have the capability of building mathematical and analytical model. These models may be helpful in the decision-making process. This paper elaborates the computational analysis in three different ways. The background and analytical aspect have been presented with the learning application in the first phase. In the second phase detail literature has been explored along with the pros and cons of the applied techniques in different domains. Based on the literatures, gap identification and the limitations have been discussed and highlighted in the third phase. Finally, computational analysis has been presented along with the machine learning results in terms of accuracy. The results mainly focus on the exploratory data analysis, domain applicability and the predictive problems. Our systematic analysis shows that the applicability of machine learning is wide and the results may be improved based on these algorithms. It is also inferred from the literature analysis that at the applicability of machine learning algorithm has the capability in the performance improvement. The main methods discussed here are classification and regression trees (CART), logistic regression, naïve Bayes (NB), k-nearest neighbors (KNN), support vector machine (SVM) and decision tree (DT). The domain covered mainly are disease detection, business intelligence, industry automation and sentiment analysis.

Download Full-text

PREDICTION OF FATIGUE CRACK GROWTH DIAGRAMS BY METHODS OF MACHINE LEARNING UNDER CONSTANT AMPLITUDE LOADING

Acta Metallurgica Slovaca ◽

10.36547/ams.26.1.346 ◽

2020 ◽

Vol 26 (1) ◽

pp. 31-33

Author(s):

Oleh Yasniy ◽

Iryna Didych ◽

Yuri Lapusta

Keyword(s):

Machine Learning ◽

Fatigue Crack ◽

Fatigue Crack Growth ◽

Crack Growth ◽

Support Vector ◽

Structural Elements ◽

Constant Amplitude ◽

K Nearest Neighbors ◽

Vector Machines ◽

Stress Ratios

Important structural elements are often under the action of constant amplitude loading. Increasing their lifetime is an actual task and of great economic importance. To evaluate the lifetime of structural elements, it is necessary to be able to predict the fatigue crack growth rate (FCG). This task can be effectively solved by methods of machine learning, in particular by neural networks, boosted trees, support-vector machines, and k -nearest neighbors. The aim of the present work was to build the fatigue crack growth diagrams of steel 0.45% C subjected to constant amplitude loading at stress ratios R = 0, and R = –1 by the methods of machine learning. The obtained results are in good agreement with the experimental data.

Download Full-text

APTITUDE Framework for Learning Data Classification Based on Machine Learning

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2020.14.51 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Machine Learning ◽

Data Classification ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Course Content ◽

Applied Model ◽

Vector Machines ◽

Learning Data

Learning analytics refers to the machine learning to provide predictions of learner success and prescriptions to learners and teachers. The main goal of paper is to proposed APTITUDE framework for learning data classification in order to achieve an adaptation and recommendations a course content or flow of course activities. This framework has applied model for student learning prediction based on machine learning. The five machine learning algorithms are used to provide learning data classification: random forest, Naïve Bayes, k-nearest neighbors, logistic regression and support vector machines

Download Full-text

Abstract 14742: Deep Learning of Intracardiac Electrograms in Atrial Arrhythmia

Circulation ◽

10.1161/circ.142.suppl_3.14742 ◽

2020 ◽

Vol 142 (Suppl_3) ◽

Author(s):

Miguel Rodrigo ◽

Albert J Rogers ◽

Prasanth Ganesan ◽

Mahmood Alhusseini ◽

Sanjiv M Narayan

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Dominant Frequency ◽

Support Vector ◽

K Nearest Neighbors ◽

Female Age ◽

Inappropriate Use ◽

C Statistic ◽

Vector Machines ◽

Combined Features

Introduction: Intracardiac devices detect atrial fibrillation (AF) by rate and regularity, but any inaccuracies may cause inappropriate use of anticoagulants or anti-arrhythmic medications. Hypothesis: Machine learning of raw intracardiac electrograms can identify AF from other atrial arrhythmias better than traditional measures of rate or regularity and without using specific electrophysiological analyses such as dominant frequency (DF). Methods: In 86 persistent AF patients (25 female, age 65±11) we recorded 64 unipolar intracardiac electrograms over 60 seconds prior to ablation (fig A). We trained deep learning models (comprising 2X1D-convolutional layers and 2 dense layers) on successive 4-sec segments labelled AF or Flutter/tachycardia (AFL), using 10-fold cross-validation with 80% of patients for training and an independent 20% for testing. We compared results to classical statistical and machine learning (ML) analyses of electrograms featurized by 30 metrics of cycle length (CL), DF and autocorrelation-based metrics (AC; fig B). Results: Identification of AF varied between methods, but was modest for features of CL (c-statistic 0.70), DF (0.67) and AC (0.75). ML that combined features improved results: linear combination (c-statistic 0.95 ± 0.04), Bagged trees (0.92 ± 0.06), k-nearest neighbors (0.92 ± 0.06) and support vector machines (0.95 ± 0.04). Deep learning using raw electrograms as input (no featurization) provided AUC of 0.95 ± 0.05 (fig C). Conclusions: Detailed machine learning of raw intracardiac electrograms identified AF more accurately than traditional indices of rate, regularity, and dominant frequency. This approach could reclassify AF detection from devices to improve management, and may reveal novel AF phenotypes with distinct clinical courses.

Download Full-text

Comparison of Machine Learning Algorithms for Cardiovascular Disease Prediction

Computational Methodologies for Electrical and Electronics Engineers - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-7998-3327-7.ch009 ◽

2021 ◽

pp. 111-126

Author(s):

Stuti Pandey ◽

Abhay Kumar Agarwal

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Learning Algorithm ◽

Learning Algorithms ◽

Research Field ◽

Machine Learning Algorithms ◽

Support Vector ◽

Disease Prediction ◽

K Nearest Neighbors ◽

The University

Cardiovascular disease prediction is a research field of healthcare which depends on a large volume of data for making effective and accurate predictions. These predictions can be more effective and accurate when used with machine learning algorithms because it can disclose all the concealed facts which are helpful in making decisions. The processing capabilities of machine learning algorithms are also very fast which is almost infeasible for human beings. Therefore, the work presented in this research focuses on identifying the best machine learning algorithm by comparing their performances for predicting cardiovascular diseases in a reasonable time. The machine learning algorithms which have been used in the presented work are naïve Bayes, support vector machine, k-nearest neighbors, and random forest. The dataset which has been utilized for this comparison is taken from the University of California, Irvine (UCI) machine learning repository named “Heart Disease Data Set.”

Download Full-text

Forecasting of Breast Cancer and Diabetes Using Ensemble Learning

International Journal of Computer Communication and Informatics ◽

10.34256/ijcci1911 ◽

2019 ◽

Vol 1 (1) ◽

pp. 1-5 ◽

Cited By ~ 1

Author(s):

Shraboni Rudra ◽

Minhaz Uddin ◽

Mohammed Minhajul Alam

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Learning Algorithm ◽

Combine Method ◽

Diabetes Patient ◽

Support Vector ◽

Classification Of Diseases ◽

The Cost ◽

Very High

Machine learning algorithm plays an important role in our life. It is the subset of Artificial intelligence. Recently, everyone tries to use AI or try to invent something related to AI for making life easier. In the medical field, Machine learning is used for the recognition and classification of diseases. It can classify cancer, diabetes or other diseases more accurately from datasets. So, we propose a model which is the combination of Support vector machine and Ad boost. This combine method is known as Ensemble learner. In this paper, we are predicting diabetes and breast cancer. We have used SVM for classification purpose then have applied Ad boost for boosting purposes. The number of a diabetes patient is increasing very rapidly. It causes many other diseases like kidney failure; Eye disorder etc. No medicines are invented to prevent diabetes fully. Breast cancer is increasing very rapidly between women. The cost of breast cancer treatment is very high. More researches are running on diabetes and breast cancer. We proposed our model to predict the diseases more accurately rather than the previous models.

Download Full-text