Mitigating prediction error of deep learning streamflow models in large data‐sparse regions with ensemble modeling and soft data

Author(s):  
Dapeng Feng ◽  
Kathryn Lawson ◽  
Chaopeng Shen
2020 ◽  
Author(s):  
Turki Turki ◽  
Y-h. Taguchi

AbstractAnalyzing single-cell pancreatic data would play an important role in understanding various metabolic diseases and health conditions. Due to the sparsity and noise present in such single-cell gene expression data, analyzing various functions related to the inference of gene regulatory networks, derived from single-cell data, remains difficult, thereby posing a barrier to the deepening of understanding of cellular metabolism. Since recent studies have led to the reliable inference of single-cell gene regulatory networks (SCGRNs), the challenge of discriminating between SCGRNs has now arisen. By accurately discriminating between SCGRNs (e.g., distinguishing SCGRNs of healthy pancreas from those of T2D pancreas), biologists would be able to annotate, organize, visualize, and identify common patterns of SCGRNs for metabolic diseases. Such annotated SCGRNs could play an important role in speeding up the process of building large data repositories. In this study, we aimed to contribute to the development of a novel deep learning (DL) application. First, we generated a dataset consisting of 224 SCGRNs belonging to both T2D and healthy pancreas and made it freely available. Next, we chose seven DL architectures, including VGG16, VGG19, Xception, ResNet50, ResNet101, DenseNet121, and DenseNet169, trained each of them on the dataset, and checked prediction based on a test set. We evaluated the DL architectures on an HP workstation platform with a single NVIDIA GeForce RTX 2080Ti GPU. Experimental results on the whole dataset, using several performance measures, demonstrated the superiority of VGG19 DL model in the automatic classification of SCGRNs, derived from the single-cell pancreatic data.


2020 ◽  
pp. 1826-1838
Author(s):  
Rojalina Priyadarshini ◽  
Rabindra K. Barik ◽  
Chhabi Panigrahi ◽  
Harishchandra Dubey ◽  
Brojo Kishore Mishra

This article describes how machine learning (ML) algorithms are very useful for analysis of data and finding some meaningful information out of them, which could be used in various other applications. In the last few years, an explosive growth has been seen in the dimension and structure of data. There are several difficulties faced by conventional ML algorithms while dealing with such highly voluminous and unstructured big data. The modern ML tools are designed and used to deal with all sorts of complexities of data. Deep learning (DL) is one of the modern ML tools which are commonly used to find the hidden structure and cohesion among these large data sets by giving proper training in parallel platforms with intelligent optimization techniques to further analyze and interpret the data for future prediction and classification. This article focuses on the use of DL tools and software which are used in past couple of years in various areas and especially in the area of healthcare applications.


2022 ◽  
pp. 27-50
Author(s):  
Rajalaxmi Prabhu B. ◽  
Seema S.

A lot of user-generated data is available these days from huge platforms, blogs, websites, and other review sites. These data are usually unstructured. Analyzing sentiments from these data automatically is considered an important challenge. Several machine learning algorithms are implemented to check the opinions from large data sets. A lot of research has been undergone in understanding machine learning approaches to analyze sentiments. Machine learning mainly depends on the data required for model building, and hence, suitable feature exactions techniques also need to be carried. In this chapter, several deep learning approaches, its challenges, and future issues will be addressed. Deep learning techniques are considered important in predicting the sentiments of users. This chapter aims to analyze the deep-learning techniques for predicting sentiments and understanding the importance of several approaches for mining opinions and determining sentiment polarity.


2020 ◽  
Vol 34 (01) ◽  
pp. 598-605
Author(s):  
Chaoran Cheng ◽  
Fei Tan ◽  
Zhi Wei

We consider the problem of Named Entity Recognition (NER) on biomedical scientific literature, and more specifically the genomic variants recognition in this work. Significant success has been achieved for NER on canonical tasks in recent years where large data sets are generally available. However, it remains a challenging problem on many domain-specific areas, especially the domains where only small gold annotations can be obtained. In addition, genomic variant entities exhibit diverse linguistic heterogeneity, differing much from those that have been characterized in existing canonical NER tasks. The state-of-the-art machine learning approaches heavily rely on arduous feature engineering to characterize those unique patterns. In this work, we present the first successful end-to-end deep learning approach to bridge the gap between generic NER algorithms and low-resource applications through genomic variants recognition. Our proposed model can result in promising performance without any hand-crafted features or post-processing rules. Our extensive experiments and results may shed light on other similar low-resource NER applications.


Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2424 ◽  
Author(s):  
Md Atiqur Rahman Ahad ◽  
Thanh Trung Ngo ◽  
Anindya Das Antar ◽  
Masud Ahmed ◽  
Tahera Hossain ◽  
...  

Wearable sensor-based systems and devices have been expanded in different application domains, especially in the healthcare arena. Automatic age and gender estimation has several important applications. Gait has been demonstrated as a profound motion cue for various applications. A gait-based age and gender estimation challenge was launched in the 12th IAPR International Conference on Biometrics (ICB), 2019. In this competition, 18 teams initially registered from 14 countries. The goal of this challenge was to find some smart approaches to deal with age and gender estimation from sensor-based gait data. For this purpose, we employed a large wearable sensor-based gait dataset, which has 745 subjects (357 females and 388 males), from 2 to 78 years old in the training dataset; and 58 subjects (19 females and 39 males) in the test dataset. It has several walking patterns. The gait data sequences were collected from three IMUZ sensors, which were placed on waist-belt or at the top of a backpack. There were 67 solutions from ten teams—for age and gender estimation. This paper extensively analyzes the methods and achieved-results from various approaches. Based on analysis, we found that deep learning-based solutions lead the competitions compared with conventional handcrafted methods. We found that the best result achieved 24.23% prediction error for gender estimation, and 5.39 mean absolute error for age estimation by employing angle embedded gait dynamic image and temporal convolution network.


Author(s):  
Qusay Abdullah Abed ◽  
Osamah Mohammed Fadhil ◽  
Wathiq Laftah Al-Yaseen

In general, multidimensional data (mobile application for example) contain a large number of unnecessary information. Web app users find it difficult to get the information needed quickly and effectively due to the sheer volume of data (big data produced per second). In this paper, we tend to study the data mining in web personalization using blended deep learning model. So, one of the effective solutions to this problem is web personalization. As well as, explore how this model helps to analyze and estimate the huge amounts of operations. Providing personalized recommendations to improve reliability depends on the web application using useful information in the web application. The results of this research are important for the training and testing of large data sets for a map of deep mixed learning based on the model of back-spread neural network. The HADOOP framework was used to perform a number of experiments in a different environment with a learning rate between -1 and +1. Also, using the number of techniques to evaluate the number of parameters, true positive cases are represent and fall into positive cases in this example to evaluate the proposed model.


Author(s):  
Sanjiv Das ◽  
Karthik Mokashi ◽  
Robbie Culkin

We examine the use of deep learning (neural networks) to predict the movement of the S&P 500 Index using past returns of all the stocks in the index. Our analysis finds that the future direction of the S&P 500 index can be weakly predicted by the prior movements of the underlying stocks in the index. Decomposition of the prediction error indicates that most of the lack of predictability comes from randomness and only a little from nonstationarity. We believe this is the first test of S&P500 market efficiency that uses a very large information set, and it extends the domain of weak-form market efficiency tests.


2021 ◽  
Author(s):  
George Kibirige ◽  
Ming-Chuan Yang ◽  
Chao-Lin Liu ◽  
Meng Chang Chen

We proposed RTP, a composite neural network model that captures knowledge from remote transportation pollution events (RTPEs) to improve the local PM2.5 prediction. To the best of our knowledge, this is the first deep learning work to include knowledge from remote pollutants for PM2.5 prediction. RTP consists of two neural network components: a pre-trained base model and STRI model. The base model captures knowledge from local factors that influence PM2.5 concentrations and STRI captures knowledge from RTPEs by learning spatial-temporal characteristics of Satellite base AOD data and weather features from remote areas. In addition, given the size of the STRI model, to facilitate training and improve results we divide the full STRI model into two components: STRI\_fe, which is used to extract spatial-temporal features from remote areas, and STRI\_p, which predicts local PM2.5 concentrations using both remote and local features. The prediction results from STRI\_p show that the prediction error is reduced when remote features are added to the model, demonstrating that the STRI model indeed captures knowledge from RTPEs.<div>To characterize the occurrence of RTPEs in northern Taiwan, we also developed an algorithm to classify PM2.5 concentrations attributable to RTPEs. We use the STRI model for the prediction of two EPA stations located at the northern tip of Taiwan and apply the classification algorithm to the results. This yields improvements in accuracy when remote features are added to the model, which demonstrates the impact of RTPEs at the stations.</div>


2020 ◽  
Author(s):  
Hannes Wartmann ◽  
Sven Heins ◽  
Karin Kloiber ◽  
Stefan Bonn

AbstractRecent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Here we investigate RNA-seq metadata prediction based on gene expression values. We present a deep-learning based domain adaptation algorithm for the automatic annotation of RNA-seq metadata. We show how our algorithm outperforms existing approaches as well as traditional deep learning methods for the prediction of tissue, sample source, and patient sex information across several large data repositories. By using a model architecture similar to siamese networks the algorithm is able to learn biases from datasets with few samples. Our domain adaptation approach achieves metadata annotation accuracies up to 12.3% better than a previously published method. Lastly, we provide a list of more than 10,000 novel tissue and sex label annotations for 8,495 unique SRA samples.


Sign in / Sign up

Export Citation Format

Share Document