Mitigating prediction error of deep learning streamflow models in large data‐sparse regions with ensemble modeling and soft data

Discriminating the Single-cell Gene Regulatory Networks of Human Pancreatic Islets: A Novel Deep Learning Application

10.1101/2020.08.30.273839 ◽

2020 ◽

Author(s):

Turki Turki ◽

Y-h. Taguchi

Keyword(s):

Deep Learning ◽

Single Cell ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Metabolic Diseases ◽

Large Data ◽

Data Repositories ◽

Cell Gene Expression ◽

Gene Regulatory ◽

Cell Gene

AbstractAnalyzing single-cell pancreatic data would play an important role in understanding various metabolic diseases and health conditions. Due to the sparsity and noise present in such single-cell gene expression data, analyzing various functions related to the inference of gene regulatory networks, derived from single-cell data, remains difficult, thereby posing a barrier to the deepening of understanding of cellular metabolism. Since recent studies have led to the reliable inference of single-cell gene regulatory networks (SCGRNs), the challenge of discriminating between SCGRNs has now arisen. By accurately discriminating between SCGRNs (e.g., distinguishing SCGRNs of healthy pancreas from those of T2D pancreas), biologists would be able to annotate, organize, visualize, and identify common patterns of SCGRNs for metabolic diseases. Such annotated SCGRNs could play an important role in speeding up the process of building large data repositories. In this study, we aimed to contribute to the development of a novel deep learning (DL) application. First, we generated a dataset consisting of 224 SCGRNs belonging to both T2D and healthy pancreas and made it freely available. Next, we chose seven DL architectures, including VGG16, VGG19, Xception, ResNet50, ResNet101, DenseNet121, and DenseNet169, trained each of them on the dataset, and checked prediction based on a test set. We evaluated the DL architectures on an HP workstation platform with a single NVIDIA GeForce RTX 2080Ti GPU. Experimental results on the whole dataset, using several performance measures, demonstrated the superiority of VGG19 DL model in the automatic classification of SCGRNs, derived from the single-cell pancreatic data.

Download Full-text

An Investigation Into the Efficacy of Deep Learning Tools for Big Data Analysis in Health Care

Data Analytics in Medicine ◽

10.4018/978-1-7998-1204-3.ch091 ◽

2020 ◽

pp. 1826-1838

Author(s):

Rojalina Priyadarshini ◽

Rabindra K. Barik ◽

Chhabi Panigrahi ◽

Harishchandra Dubey ◽

Brojo Kishore Mishra

Keyword(s):

Big Data ◽

Deep Learning ◽

Large Data ◽

Big Data Analysis ◽

Optimization Techniques ◽

Data Sets ◽

Learning Tools ◽

Healthcare Applications ◽

Proper Training ◽

Future Prediction

This article describes how machine learning (ML) algorithms are very useful for analysis of data and finding some meaningful information out of them, which could be used in various other applications. In the last few years, an explosive growth has been seen in the dimension and structure of data. There are several difficulties faced by conventional ML algorithms while dealing with such highly voluminous and unstructured big data. The modern ML tools are designed and used to deal with all sorts of complexities of data. Deep learning (DL) is one of the modern ML tools which are commonly used to find the hidden structure and cohesion among these large data sets by giving proper training in parallel platforms with intelligent optimization techniques to further analyze and interpret the data for future prediction and classification. This article focuses on the use of DL tools and software which are used in past couple of years in various areas and especially in the area of healthcare applications.

Download Full-text

Deep Learning Approaches for Sentiment Analysis Challenges and Future Issues

10.4018/978-1-7998-8161-2.ch003 ◽

2022 ◽

pp. 27-50

Author(s):

Rajalaxmi Prabhu B. ◽

Seema S.

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Model Building ◽

Large Data ◽

Machine Learning Algorithms ◽

Large Data Sets ◽

Data Sets ◽

Learning Approaches ◽

Learning Techniques ◽

Important Challenge

A lot of user-generated data is available these days from huge platforms, blogs, websites, and other review sites. These data are usually unstructured. Analyzing sentiments from these data automatically is considered an important challenge. Several machine learning algorithms are implemented to check the opinions from large data sets. A lot of research has been undergone in understanding machine learning approaches to analyze sentiments. Machine learning mainly depends on the data required for model building, and hence, suitable feature exactions techniques also need to be carried. In this chapter, several deep learning approaches, its challenges, and future issues will be addressed. Deep learning techniques are considered important in predicting the sentiments of users. This chapter aims to analyze the deep-learning techniques for predicting sentiments and understanding the importance of several approaches for mining opinions and determining sentiment polarity.

Download Full-text

DeepVar: An End-to-End Deep Learning Approach for Genomic Variant Recognition in Biomedical Literature

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5399 ◽

2020 ◽

Vol 34 (01) ◽

pp. 598-605

Author(s):

Chaoran Cheng ◽

Fei Tan ◽

Zhi Wei

Keyword(s):

Deep Learning ◽

Large Data ◽

Biomedical Literature ◽

Entity Recognition ◽

Learning Approach ◽

Learning Approaches ◽

Genomic Variants ◽

Low Resource ◽

End To End ◽

Genomic Variant

We consider the problem of Named Entity Recognition (NER) on biomedical scientific literature, and more specifically the genomic variants recognition in this work. Significant success has been achieved for NER on canonical tasks in recent years where large data sets are generally available. However, it remains a challenging problem on many domain-specific areas, especially the domains where only small gold annotations can be obtained. In addition, genomic variant entities exhibit diverse linguistic heterogeneity, differing much from those that have been characterized in existing canonical NER tasks. The state-of-the-art machine learning approaches heavily rely on arduous feature engineering to characterize those unique patterns. In this work, we present the first successful end-to-end deep learning approach to bridge the gap between generic NER algorithms and low-resource applications through genomic variants recognition. Our proposed model can result in promising performance without any hand-crafted features or post-processing rules. Our extensive experiments and results may shed light on other similar low-resource NER applications.

Download Full-text

Wearable Sensor-Based Gait Analysis for Age and Gender Estimation

Sensors ◽

10.3390/s20082424 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2424 ◽

Cited By ~ 2

Author(s):

Md Atiqur Rahman Ahad ◽

Thanh Trung Ngo ◽

Anindya Das Antar ◽

Masud Ahmed ◽

Tahera Hossain ◽

...

Keyword(s):

Deep Learning ◽

Age Estimation ◽

Prediction Error ◽

Mean Absolute Error ◽

Absolute Error ◽

Training Dataset ◽

Wearable Sensor ◽

Age And Gender ◽

Test Dataset ◽

And Gender

Wearable sensor-based systems and devices have been expanded in different application domains, especially in the healthcare arena. Automatic age and gender estimation has several important applications. Gait has been demonstrated as a profound motion cue for various applications. A gait-based age and gender estimation challenge was launched in the 12th IAPR International Conference on Biometrics (ICB), 2019. In this competition, 18 teams initially registered from 14 countries. The goal of this challenge was to find some smart approaches to deal with age and gender estimation from sensor-based gait data. For this purpose, we employed a large wearable sensor-based gait dataset, which has 745 subjects (357 females and 388 males), from 2 to 78 years old in the training dataset; and 58 subjects (19 females and 39 males) in the test dataset. It has several walking patterns. The gait data sequences were collected from three IMUZ sensors, which were placed on waist-belt or at the top of a backpack. There were 67 solutions from ten teams—for age and gender estimation. This paper extensively analyzes the methods and achieved-results from various approaches. Based on analysis, we found that deep learning-based solutions lead the competitions compared with conventional handcrafted methods. We found that the best result achieved 24.23% prediction error for gender estimation, and 5.39 mean absolute error for age estimation by employing angle embedded gait dynamic image and temporal convolution network.

Download Full-text

An Electro encephalographic signal Classification in Large Data Set using Deep learning Techniques

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2020/058102020 ◽

2020 ◽

Vol 8 (10) ◽

pp. 6658-6662

Keyword(s):

Deep Learning ◽

Large Data ◽

Signal Classification ◽

Data Set ◽

Large Data Set ◽

Learning Techniques

Download Full-text

Data mining in web personalization using the blended deep learning model

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v20.i3.pp1507-1512 ◽

2020 ◽

Vol 20 (3) ◽

pp. 1507

Author(s):

Qusay Abdullah Abed ◽

Osamah Mohammed Fadhil ◽

Wathiq Laftah Al-Yaseen

Keyword(s):

Data Mining ◽

Deep Learning ◽

Web Application ◽

Large Data ◽

Learning Model ◽

Multidimensional Data ◽

Data Sets ◽

Web Personalization ◽

Deep Learning Model ◽

The Web

In general, multidimensional data (mobile application for example) contain a large number of unnecessary information. Web app users find it difficult to get the information needed quickly and effectively due to the sheer volume of data (big data produced per second). In this paper, we tend to study the data mining in web personalization using blended deep learning model. So, one of the effective solutions to this problem is web personalization. As well as, explore how this model helps to analyze and estimate the huge amounts of operations. Providing personalized recommendations to improve reliability depends on the web application using useful information in the web application. The results of this research are important for the training and testing of large data sets for a map of deep mixed learning based on the model of back-spread neural network. The HADOOP framework was used to perform a number of experiments in a different environment with a learning rate between -1 and +1. Also, using the number of techniques to evaluate the number of parameters, true positive cases are represent and fall into positive cases in this example to evaluate the proposed model.

Download Full-text

Are Markets Truly Efficient? Experiments using Deep Learning for Market Movement Prediction

10.20944/preprints201805.0015.v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Sanjiv Das ◽

Karthik Mokashi ◽

Robbie Culkin

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Market Efficiency ◽

Prediction Error ◽

Weak Form ◽

Movement Prediction ◽

Index Decomposition ◽

Information Set ◽

Underlying Stocks ◽

Future Direction

We examine the use of deep learning (neural networks) to predict the movement of the S&P 500 Index using past returns of all the stocks in the index. Our analysis finds that the future direction of the S&P 500 index can be weakly predicted by the prior movements of the underlying stocks in the index. Decomposition of the prediction error indicates that most of the lack of predictability comes from randomness and only a little from nonstationarity. We believe this is the first test of S&P500 market efficiency that uses a very large information set, and it extends the domain of weak-form market efficiency tests.

Download Full-text

Using Satellite Data on Remote Transportation of Air Pollutants for PM2.5 Prediction in Northern Taiwan

10.36227/techrxiv.13734067.v1 ◽

2021 ◽

Author(s):

George Kibirige ◽

Ming-Chuan Yang ◽

Chao-Lin Liu ◽

Meng Chang Chen

Keyword(s):

Neural Network ◽

Deep Learning ◽

Air Pollutants ◽

Prediction Error ◽

Local Factors ◽

Remote Areas ◽

Temporal Features ◽

The Impact ◽

Pollution Events ◽

Northern Taiwan

We proposed RTP, a composite neural network model that captures knowledge from remote transportation pollution events (RTPEs) to improve the local PM2.5 prediction. To the best of our knowledge, this is the first deep learning work to include knowledge from remote pollutants for PM2.5 prediction. RTP consists of two neural network components: a pre-trained base model and STRI model. The base model captures knowledge from local factors that influence PM2.5 concentrations and STRI captures knowledge from RTPEs by learning spatial-temporal characteristics of Satellite base AOD data and weather features from remote areas. In addition, given the size of the STRI model, to facilitate training and improve results we divide the full STRI model into two components: STRI\_fe, which is used to extract spatial-temporal features from remote areas, and STRI\_p, which predicts local PM2.5 concentrations using both remote and local features. The prediction results from STRI\_p show that the prediction error is reduced when remote features are added to the model, demonstrating that the STRI model indeed captures knowledge from RTPEs.<div>To characterize the occurrence of RTPEs in northern Taiwan, we also developed an algorithm to classify PM2.5 concentrations attributable to RTPEs. We use the STRI model for the prediction of two EPA stations located at the northern tip of Taiwan and apply the classification algorithm to the results. This yields improvements in accuracy when remote features are added to the model, which demonstrates the impact of RTPEs at the stations.</div>

Download Full-text

Bias invariant RNA-seq metadata annotation

10.1101/2020.11.26.399568 ◽

2020 ◽

Author(s):

Hannes Wartmann ◽

Sven Heins ◽

Karin Kloiber ◽

Stefan Bonn

Keyword(s):

Deep Learning ◽

Domain Adaptation ◽

Tissue Sample ◽

Large Data ◽

Biomedical Data ◽

Rna Seq ◽

Adaptation Algorithm ◽

Data Repositories ◽

Technological Advances ◽

Metadata Annotation

AbstractRecent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Here we investigate RNA-seq metadata prediction based on gene expression values. We present a deep-learning based domain adaptation algorithm for the automatic annotation of RNA-seq metadata. We show how our algorithm outperforms existing approaches as well as traditional deep learning methods for the prediction of tissue, sample source, and patient sex information across several large data repositories. By using a model architecture similar to siamese networks the algorithm is able to learn biases from datasets with few samples. Our domain adaptation approach achieves metadata annotation accuracies up to 12.3% better than a previously published method. Lastly, we provide a list of more than 10,000 novel tissue and sex label annotations for 8,495 unique SRA samples.

Download Full-text