A Novel Ensemble Stacking Classification of Genetic Variations Using Machine Learning Algorithms

Author(s):  
Jahnavi Yeturu ◽  
Poongothai Elango ◽  
S. P. Raja ◽  
P. Nagendra Kumar

Genetics is the clinical review of congenital mutation, where the principal advantage of analyzing genetic mutation of humans is the exploration, analysis, interpretation and description of the genetic transmitted and inherited effect of several diseases such as cancer, diabetes and heart diseases. Cancer is the most troublesome and disordered affliction as the proportion of cancer sufferers is growing massively. Identification and discrimination of the mutations that impart to the enlargement of tumor from the unbiased mutations is difficult, as majority tumors of cancer are able to exercise genetic mutations. The genetic mutations are systematized and categorized to sort the cancer by way of medical observations and considering clinical studies. At the present time, genetic mutations are being annotated and these interpretations are being accomplished either manually or using the existing primary algorithms. Evaluation and classification of each and every individual genetic mutation was basically predicated on evidence from documented content built on medical literature. Consequently, as a means to build genetic mutations, basically, depending on the clinical evidences persists a challenging task. There exist various algorithms such as one hot encoding technique is used to derive features from genes and their variations, TF-IDF is used to extract features from the clinical text data. In order to increase the accuracy of the classification, machine learning algorithms such as support vector machine, logistic regression, Naive Bayes, etc., are experimented. A stacking model classifier has been developed to increase the accuracy. The proposed stacking model classifier has obtained the log loss 0.8436 and 0.8572 for cross-validation data set and test data set, respectively. By the experimentation, it has been proved that the proposed stacking model classifier outperforms the existing algorithms in terms of log loss. Basically, minimum log loss refers to the efficient model. Here the log loss has been reduced to less than 1 by using the proposed stacking model classifier. The performance of these algorithms can be gauged on the basis of the various measures like multi-class log loss.

Author(s):  
Angana Saikia ◽  
Vinayak Majhi ◽  
Masaraf Hussain ◽  
Sudip Paul ◽  
Amitava Datta

Tremor is an involuntary quivering movement or shake. Characteristically occurring at rest, the classic slow, rhythmic tremor of Parkinson's disease (PD) typically starts in one hand, foot, or leg and can eventually affect both sides of the body. The resting tremor of PD can also occur in the jaw, chin, mouth, or tongue. Loss of dopamine leads to the symptoms of Parkinson's disease and may include a tremor. For some people, a tremor might be the first symptom of PD. Various studies have proposed measurable technologies and the analysis of the characteristics of Parkinsonian tremors using different techniques. Various machine-learning algorithms such as a support vector machine (SVM) with three kernels, a discriminant analysis, a random forest, and a kNN algorithm are also used to classify and identify various kinds of tremors. This chapter focuses on an in-depth review on identification and classification of various Parkinsonian tremors using machine learning algorithms.


Diagnostics ◽  
2019 ◽  
Vol 9 (3) ◽  
pp. 104 ◽  
Author(s):  
Ahmed ◽  
Yigit ◽  
Isik ◽  
Alpkocak

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.


Symmetry ◽  
2020 ◽  
Vol 12 (4) ◽  
pp. 581
Author(s):  
Guadalupe Obdulia Gutiérrez-Esparza ◽  
Oscar Infante Vázquez ◽  
Maite Vallejo ◽  
José Hernández-Torruco

Metabolic syndrome is a health condition that increases the risk of heart diseases, diabetes, and stroke. The prognostic variables that identify this syndrome have already been defined by the World Health Organization (WHO), the National Cholesterol Education Program Third Adult Treatment Panel (ATP III) as well as by the International Diabetes Federation. According to these guides, there is some symmetry among anthropometric prognostic variables to classify abdominal obesity in people with metabolic syndrome. However, some appear to be more sensitive than others, nevertheless, these proposed definitions have failed to appropriately classify a specific population or ethnic group. In this work, we used the ATP III criteria as the framework with the purpose to rank the health parameters (clinical and anthropometric measurements, lifestyle data, and blood tests) from a data set of 2942 participants of Mexico City Tlalpan 2020 cohort, applying machine learning algorithms. We aimed to find the most appropriate prognostic variables to classify Mexicans with metabolic syndrome. The criteria of sensitivity, specificity, and balanced accuracy were used for validation. The ATP III using Waist-to-Height-Ratio (WHtR) as an anthropometric index for the diagnosis of abdominal obesity achieved better performance in classification than waist or body mass index. Further work is needed to assess its precision as a classification tool for Metabolic Syndrome in a Mexican population.


Author(s):  
Baban. U. Rindhe ◽  
Nikita Ahire ◽  
Rupali Patil ◽  
Shweta Gagare ◽  
Manisha Darade

Heart-related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need fora reliable, accurate, and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart-related diseases. Heart is the next major organ comparing to the brain which has more priority in the Human body. It pumps the blood and supplies it to all organs of the whole body. Prediction of occurrences of heart diseases in the medical field is significant work. Data analytics is useful for prediction from more information and it helps the medical center to predict various diseases. A huge amount of patient-related data is maintained on monthly basis. The stored data can be useful for the source of predicting the occurrence of future diseases. Some of the data mining and machine learning techniques are used to predict heart diseases, such as Artificial Neural Network (ANN), Random Forest,and Support Vector Machine (SVM).Prediction and diagnosingof heart disease become a challenging factor faced by doctors and hospitals both in India and abroad. To reduce the large scale of deaths from heart diseases, a quick and efficient detection technique is to be discovered. Data mining techniques and machine learning algorithms play a very important role in this area. The researchers accelerating their research works to develop software with thehelp of machine learning algorithms which can help doctors to decide both prediction and diagnosing of heart disease. The main objective of this research project is to predict the heart disease of a patient using machine learning algorithms.


Author(s):  
D. Wang ◽  
M. Hollaus ◽  
N. Pfeifer

Classification of wood and leaf components of trees is an essential prerequisite for deriving vital tree attributes, such as wood mass, leaf area index (LAI) and woody-to-total area. Laser scanning emerges to be a promising solution for such a request. Intensity based approaches are widely proposed, as different components of a tree can feature discriminatory optical properties at the operating wavelengths of a sensor system. For geometry based methods, machine learning algorithms are often used to separate wood and leaf points, by providing proper training samples. However, it remains unclear how the chosen machine learning classifier and features used would influence classification results. To this purpose, we compare four popular machine learning classifiers, namely Support Vector Machine (SVM), Na¨ıve Bayes (NB), Random Forest (RF), and Gaussian Mixture Model (GMM), for separating wood and leaf points from terrestrial laser scanning (TLS) data. Two trees, an <i>Erytrophleum fordii</i> and a <i>Betula pendula</i> (silver birch) are used to test the impacts from classifier, feature set, and training samples. Our results showed that RF is the best model in terms of accuracy, and local density related features are important. Experimental results confirmed the feasibility of machine learning algorithms for the reliable classification of wood and leaf points. It is also noted that our studies are based on isolated trees. Further tests should be performed on more tree species and data from more complex environments.


Author(s):  
Mingyue Wu ◽  
Ran Wang ◽  
Yang Hu ◽  
Mengjiao Fan ◽  
Yufan Wang ◽  
...  

This study examined the reliability of a tennis stroke classification and assessment platform consisting of a single low-cost MEMS sensor in a wrist-worn wearable device, smartphone, and computer. The data that was collected was transmitted via Bluetooth and analyzed by machine learning algorithms. Twelve right-handed male elite tennis athletes participated in the study, and each athlete performed 150 strokes. The results from three machine learning algorithms regarding their recognition and classification of the real-time data stream were compared. Stroke recognition and classification went through pre-processing, segmentation, feature extraction, and classification with Support Vector Machine (SVM), including SVM without normalization, SVM with Min–Max, SVM with Z-score normalization, K-nearest neighbor (K-NN), and Naive Bayes (NB) machine learning algorithms. During the data training process, 10-fold cross-validation was used to avoid overfitting and suitable parameters were found within the SVM classifiers. The best classifier was achieved when C = 1 using the RBF kernel function. Different machine learning algorithms’ classification of unique stroke types yielded highly reliable clusters within each stroke type with the highest test accuracy of 99% achieved by SVM with Min–Max normalization and 98.4% achieved using SVM with a Z-score normalization classifier.


Author(s):  
Angana Saikia ◽  
Vinayak Majhi ◽  
Masaraf Hussain ◽  
Sudip Paul ◽  
Amitava Datta

Tremor is an involuntary quivering movement or shake. Characteristically occurring at rest, the classic slow, rhythmic tremor of Parkinson's disease (PD) typically starts in one hand, foot, or leg and can eventually affect both sides of the body. The resting tremor of PD can also occur in the jaw, chin, mouth, or tongue. Loss of dopamine leads to the symptoms of Parkinson's disease and may include a tremor. For some people, a tremor might be the first symptom of PD. Various studies have proposed measurable technologies and the analysis of the characteristics of Parkinsonian tremors using different techniques. Various machine-learning algorithms such as a support vector machine (SVM) with three kernels, a discriminant analysis, a random forest, and a kNN algorithm are also used to classify and identify various kinds of tremors. This chapter focuses on an in-depth review on identification and classification of various Parkinsonian tremors using machine learning algorithms.


Background/Aim: Healthcare is an unavoidable assignment to be done in human life. Cardiovascular sickness is a general class for a scope of infections that are influencing heart and veins. The early strategies for estimating the cardiovascular sicknesses helped in settling on choices about the progressions to have happened in high-chance patients which brought about the decrease of their dangers. Methods: In the proposed research, we have considered informational collection from kaggle and it doesn't require information pre-handling systems like the expulsion of noise data, evacuation of missing information, filling default esteems if applicable and classification of attributes for prediction and decision making at different levels. The performance of the diagnosis model is obtained by using methods like classification, accuracy, sensitivity and specificity analysis. This paper proposes a prediction model to predict whether a people have a cardiovascular disease or not and to provide an awareness or diagnosis on that. This is done by comparing the accuracies of applying rules to the individual results of Support Vector Machine, Random forest, Naive Bayes classifier and logistic regression on the dataset taken in a region to present an accurate model of predicting cardiovascular disease. Results: The machine learning algorithms under study were able to predict cardiovascular disease in patients with accuracy between 58.71% and 77.06%. Conclusions: It was shown that Logistic Regression has better Accuracy (77.06 %) when compared to different Machine-learning Algorithms.


Recycling ◽  
2021 ◽  
Vol 6 (4) ◽  
pp. 65
Author(s):  
Ali Hewiagh ◽  
Kannan Ramakrishnan ◽  
Timothy Tzen Vun Yap ◽  
Ching Seong Tan

Online frauds have pernicious impacts on different system domains, including waste management systems. Fraudsters illegally obtain rewards for their recycling activities or avoid penalties for those who are required to recycle their own waste. Although some approaches have been introduced to prevent such fraudulent activities, the fraudsters continuously seek new ways to commit illegal actions. Machine learning technology has shown significant and impressive results in identifying new online fraud patterns in different system domains such as e-commerce, insurance, and banking. The purpose of this paper, therefore, is to analyze a waste management system and develop a machine learning model to detect fraud in the system. The intended system allows consumers, individuals, and organizations to track, monitor, and update their performance in their recycling activities. The data set provided by a waste management organization is used for the analysis and the model training. This data set contains transactions of users’ recycling activities and behaviors. Three machine learning algorithms, random forest, support vector machine, and multi-layer perceptron are used in the experiments and the best detection model is selected based on the model’s performance. Results show that each of these algorithms can be used for fraud detection in waste managements with high accuracy. The random forest algorithm produces the optimal model with an accuracy of 96.33%, F1-score of 95.20%, and ROC of 98.92%.


Sign in / Sign up

Export Citation Format

Share Document