Combining Multiple RNA-Seq Data Analysis Algorithms Using Machine Learning Improves Differential Isoform Expression Analysis

RNA sequencing has become the standard technique for high resolution genome-wide monitoring of gene expression. As such, it often comprises the first step towards understanding complex molecular mechanisms driving various phenotypes, spanning organ development to disease genesis, monitoring and progression. An advantage of RNA sequencing is its ability to capture complex transcriptomic events such as alternative splicing which results in alternate isoform abundance. At the same time, this advantage remains algorithmically and computationally challenging, especially with the emergence of even higher resolution technologies such as single-cell RNA sequencing. Although several algorithms have been proposed for the effective detection of differential isoform expression from RNA-Seq data, no widely accepted golden standards have been established. This fact is further compounded by the significant differences in the output of different algorithms when applied on the same data. In addition, many of the proposed algorithms remain scarce and poorly maintained. Driven by these challenges, we developed a novel integrative approach that effectively combines the most widely used algorithms for differential transcript and isoform analysis using state-of-the-art machine learning techniques. We demonstrate its usability by applying it on simulated data based on several organisms, and using several performance metrics; we conclude that our strategy outperforms the application of the individual algorithms. Finally, our approach is implemented as an R Shiny application, with the underlying data analysis pipelines also available as docker containers.

Download Full-text

Prediction of credit card defaults through data analysis and machine learning techniques

Materials Today Proceedings ◽

10.1016/j.matpr.2021.04.588 ◽

2021 ◽

Author(s):

Saurabh Arora ◽

Sushant Bindra ◽

Survesh Singh ◽

Vinay Kumar Nassa

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Credit Card ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

An investigation of various machine learning techniques for mobile call data analysis for reducing call drop

Materials Today Proceedings ◽

10.1016/j.matpr.2021.11.622 ◽

2021 ◽

Author(s):

G.V. Ashok ◽

Vasanthi Kumari

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Prediction of Confusion Attempting Algebra Homework in an Intelligent Tutoring System through Machine Learning Techniques for Educational Sustainable Development

Sustainability ◽

10.3390/su11010105 ◽

2018 ◽

Vol 11 (1) ◽

pp. 105 ◽

Cited By ~ 10

Author(s):

Syed Abidi ◽

Mushtaq Hussain ◽

Yonglin Xu ◽

Wu Zhang

Keyword(s):

Machine Learning ◽

Sustainable Development ◽

Teaching And Learning ◽

Performance Metrics ◽

Intelligent Tutoring ◽

Intelligent Tutoring System ◽

Vital Role ◽

Machine Learning Techniques ◽

Tutoring System ◽

Learning Techniques

Incorporating substantial, sustainable development issues into teaching and learning is the ultimate task of Education for Sustainable Development (ESD). The purpose of our study was to identify the confused students who had failed to master the skill(s) given by the tutors as homework using the Intelligent Tutoring System (ITS). We have focused ASSISTments, an ITS in this study, and scrutinized the skill-builder data using machine learning techniques and methods. We used seven candidate models including: Naïve Bayes (NB), Generalized Linear Model (GLM), Logistic Regression (LR), Deep Learning (DL), Decision Tree (DT), Random Forest (RF), and Gradient Boosted Trees (XGBoost). We trained, validated, and tested learning algorithms, performed stratified cross-validation, and measured the performance of the models through various performance metrics, i.e., ROC (Receiver Operating Characteristic), Accuracy, Precision, Recall, F-Measure, Sensitivity, and Specificity. We found RF, GLM, XGBoost, and DL were high accuracy-achieving classifiers. However, other perceptions such as detecting unexplored features that might be related to the forecasting of outputs can also boost the accuracy of the prediction model. Through machine learning methods, we identified the group of students that were confused when attempting the homework exercise, to help foster their knowledge and talent to play a vital role in environmental development.

Download Full-text

Transcriptomic responses to environmental change in fishes: Insights from RNA sequencing

FACETS ◽

10.1139/facets-2017-0015 ◽

2017 ◽

Vol 2 (2) ◽

pp. 610-641 ◽

Cited By ~ 23

Author(s):

Rebekah A. Oomen ◽

Jeffrey A. Hutchings

Keyword(s):

Climate Change ◽

Rna Sequencing ◽

Molecular Mechanisms ◽

Environmental Variability ◽

Global Climate ◽

Genetic Research ◽

Model Organisms ◽

Rna Seq ◽

Transcriptomic Responses ◽

Ecological Importance

The need to better understand how plasticity and evolution affect organismal responses to environmental variability is paramount in the face of global climate change. The potential for using RNA sequencing (RNA-seq) to study complex responses by non-model organisms to the environment is evident in a rapidly growing body of literature. This is particularly true of fishes for which research has been motivated by their ecological importance, socioeconomic value, and increased use as model species for medical and genetic research. Here, we review studies that have used RNA-seq to study transcriptomic responses to continuous abiotic variables to which fishes have likely evolved a response and that are predicted to be affected by climate change (e.g., salinity, temperature, dissolved oxygen concentration, and pH). Field and laboratory experiments demonstrate the potential for individuals to respond plastically to short- and long-term environmental stress and reveal molecular mechanisms underlying developmental and transgenerational plasticity, as well as adaptation to different environmental regimes. We discuss experimental, analytical, and conceptual issues that have arisen from this work and suggest avenues for future study.

Download Full-text

Supermarket Sales Prediction Using Regression

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/951022021 ◽

2021 ◽

Vol 10 (2) ◽

pp. 1153-1157

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Low Cost ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Customer Data ◽

Sales Data ◽

Online Marketplace ◽

Sales Prediction ◽

The Future

Sales forecasting is an important when it comes to companies who are engaged in retailing, logistics, manufacturing, marketing and wholesaling. It allows companies to allocate resources efficiently, to estimate revenue of the sales and to plan strategies which are better for company’s future. In this paper, predicting product sales from a particular store is done in a way that produces better performance compared to any machine learning algorithms. The dataset used for this project is Big Mart Sales data of the 2013.Nowadays shopping malls and Supermarkets keep track of the sales data of the each and every individual item for predicting the future demand of the customer. It contains large amount of customer data and the item attributes. Further, the frequent patterns are detected by mining the data from the data warehouse. Then the data can be used for predicting the sales of the future with the help of several machine learning techniques (algorithms) for the companies like Big Mart. In this project, we propose a model using the Xgboost algorithm for predicting sales of companies like Big Mart and founded that it produces better performance compared to other existing models. An analysis of this model with other models in terms of their performance metrics is made in this project. Big Mart is an online marketplace where people can buy or sell or advertise your merchandise at low cost. The goal of the paper is to make Big Mart the shopping paradise for the buyers and a marketing solutions for the sellers as well. The ultimate aim is the complete satisfaction of the customers. The project “SUPERMARKET SALES PREDICTION” builds a predictive model and finds out the sales of each of the product at a particular store. The Big Mart use this model to under the properties of the products which plays a major role in increasing the sales. This can also be done on the basis hypothesis that should be done before looking at the data

Download Full-text

A Survey of Bioinformatics-Based Tools in RNA-Sequencing (RNA-Seq) Data Analysis

Translational Bioinformatics and Its Application - Translational Medicine Research ◽

10.1007/978-94-024-1045-7_10 ◽

2017 ◽

pp. 223-248 ◽

Cited By ~ 1

Author(s):

Pallavi Gaur ◽

Anoop Chaturvedi

Keyword(s):

Data Analysis ◽

Rna Sequencing ◽

Rna Seq

Download Full-text

Hybrid approach with Deep Auto-Encoder and optimized LSTM based Deep Learning approach to detect anomaly in cloud logs

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201707 ◽

2021 ◽

pp. 1-15

Author(s):

Savaridassan Pankajashan ◽

G. Maragatham ◽

T. Kirthiga Devi

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Anomaly Detection ◽

Performance Metrics ◽

Hybrid Approach ◽

Machine Learning Techniques ◽

Support Vector ◽

Paper Machine ◽

Log Data ◽

Isolation Forest

Anomaly-based detection is coupled with recognizing the uncommon, to catch the unusual activity, and to find the strange action behind that activity. Anomaly-based detection has a wide scope of critical applications, from bank application security to regular sciences to medical systems to marketing apps. Anomaly-based detection adopted by various Machine Learning techniques is really a type of system that consists of artificial intelligence. With the ever-expanding volume and new sorts of information, for example, sensor information from an incontestably enormous amount of IoT devices and from network flow data from cloud computing, it is implicitly understood without surprise that there is a developing enthusiasm for having the option to deal with more conclusions automatically by means of AI and ML applications. But with respect to anomaly detection, many applications of the scheme are simply the passion for detection. In this paper, Machine Learning (ML) techniques, namely the SVM, Isolation forest classifiers experimented and with reference to Deep Learning (DL) techniques, the proposed DA-LSTM (Deep Auto-Encoder LSTM) model are adopted for preprocessing of log data and anomaly-based detection to get better performance measures of detection. An enhanced LSTM (long-short-term memory) model, optimizing for the suitable parameter using a genetic algorithm (GA), is utilized to recognize better the anomaly from the log data that is filtered, adopting a Deep Auto-Encoder (DA). The Deep Neural network models are utilized to change over unstructured log information to training ready features, which are reasonable for log classification in detecting anomalies. These models are assessed, utilizing two benchmark datasets, the Openstack logs, and CIDDS-001 intrusion detection OpenStack server dataset. The outcomes acquired show that the DA-LSTM model performs better than other notable ML techniques. We further investigated the performance metrics of the ML and DL models through the well-known indicator measurements, specifically, the F-measure, Accuracy, Recall, and Precision. The exploratory conclusion shows that the Isolation Forest, and Support vector machine classifiers perform roughly 81%and 79%accuracy with respect to the performance metrics measurement on the CIDDS-001 OpenStack server dataset while the proposed DA-LSTM classifier performs around 99.1%of improved accuracy than the familiar ML algorithms. Further, the DA-LSTM outcomes on the OpenStack log data-sets show better anomaly detection compared with other notable machine learning models.

Download Full-text

Data Analysis on Cancer Disease Using Machine Learning Techniques

Intelligent Systems Reference Library - Advanced Machine Learning Approaches in Cancer Prognosis ◽

10.1007/978-3-030-71975-3_2 ◽

2021 ◽

pp. 13-73

Author(s):

Soumen K. Pati ◽

Arijit Ghosh ◽

Ayan Banerjee ◽

Indrani Roy ◽

Preetam Ghosh ◽

...

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Machine Learning Techniques ◽

Cancer Disease ◽

Learning Techniques

Download Full-text

Insider Threat Detection Using Supervised Machine Learning Algorithms on an Extremely Imbalanced Dataset

International Journal of Cyber Warfare and Terrorism ◽

10.4018/ijcwt.2020040101 ◽

2020 ◽

Vol 10 (2) ◽

pp. 1-26

Author(s):

Naghmeh Moradpoor Sheykhkanloo ◽

Adam Hall

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Machine Learning Algorithms ◽

Third Party ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Insider Threat ◽

Threat Detection ◽

Imbalanced Dataset ◽

The Impact

An insider threat can take on many forms and fall under different categories. This includes malicious insider, careless/unaware/uneducated/naïve employee, and the third-party contractor. Machine learning techniques have been studied in published literature as a promising solution for such threats. However, they can be biased and/or inaccurate when the associated dataset is hugely imbalanced. Therefore, this article addresses the insider threat detection on an extremely imbalanced dataset which includes employing a popular balancing technique known as spread subsample. The results show that although balancing the dataset using this technique did not improve performance metrics, it did improve the time taken to build the model and the time taken to test the model. Additionally, the authors realised that running the chosen classifiers with parameters other than the default ones has an impact on both balanced and imbalanced scenarios, but the impact is significantly stronger when using the imbalanced dataset.

Download Full-text

Using Random Forests on Real-World City Data for Urban Planning in a Visual Semantic Decision Support System

Sensors ◽

10.3390/s19102266 ◽

2019 ◽

Vol 19 (10) ◽

pp. 2266 ◽

Cited By ~ 1

Author(s):

Nikolaos Sideris ◽

Georgios Bardis ◽

Athanasios Voulodimos ◽

Georgios Miaoulis ◽

Djamchid Ghazanfarpour

Keyword(s):

Machine Learning ◽

Urban Planning ◽

Random Forests ◽

Real World ◽

Performance Metrics ◽

World City ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Real World Data

The constantly increasing amount and availability of urban data derived from varying sources leads to an assortment of challenges that include, among others, the consolidation, visualization, and maximal exploitation prospects of the aforementioned data. A preeminent problem affecting urban planning is the appropriate choice of location to host a particular activity (either commercial or common welfare service) or the correct use of an existing building or empty space. In this paper, we propose an approach to address these challenges availed with machine learning techniques. The proposed system combines, fuses, and merges various types of data from different sources, encodes them using a novel semantic model that can capture and utilize both low-level geometric information and higher level semantic information and subsequently feeds them to the random forests classifier, as well as other supervised machine learning models for comparisons. Our experimental evaluation on multiple real-world data sets comparing the performance of several classifiers (including Feedforward Neural Networks, Support Vector Machines, Bag of Decision Trees, k-Nearest Neighbors and Naïve Bayes), indicated the superiority of Random Forests in terms of the examined performance metrics (Accuracy, Specificity, Precision, Recall, F-measure and G-mean).

Download Full-text