scholarly journals Batch Effect Removal via Batch-Free Encoding

2018 ◽  
Author(s):  
Uri Shaham

AbstractBiological measurements often contain systematic errors, also known as “batch effects”, which may invalidate downstream analysis when not handled correctly. The problem of removing batch effects is of major importance in the biological community. Despite recent advances in this direction via deep learning techniques, most current methods may not fully preserve the true biological patterns the data contains. In this work we propose a deep learning approach for batch effect removal. The crux of our approach is learning a batch-free encoding of the data, representing its intrinsic biological properties, but not batch effects. In addition, we also encode the systematic factors through a decoding mechanism and require accurate reconstruction of the data. Altogether, this allows us to fully preserve the true biological patterns represented in the data. Experimental results are reported on data obtained from two high throughput technologies, mass cytometry and single-cell RNA-seq. Beyond good performance on training data, we also observe that our system performs well on test data obtained from new patients, which was not available at training time. Our method is easy to handle, a publicly available code can be found at https://github.com/ushaham/BatchEffectRemoval2018.

Author(s):  
Vu Tuan Hai ◽  
Dang Thanh Vu ◽  
Huynh Ho Thi Mong Trinh ◽  
Pham The Bao

Recent advances in deep learning models have shown promising potential in object removal, which is the task of replacing undesired objects with appropriate pixel values using known context. Object removal-based deep learning can commonly be solved by modeling it as the Img2Img (image to image) translation or Inpainting. Instead of dealing with a large context, this paper aims at a specific application of object removal, that is, erasing braces trace out of an image having teeth with braces (called braces2teeth problem). We solved the problem by three methods corresponding to different datasets. Firstly, we use the CycleGAN model to deal with the problem that paired training data is not available. In the second case, we try to create pseudo-paired data to train the Pix2Pix model. In the last case, we utilize GraphCut combining generative inpainting model to build a user-interactive tool that can improve the result in case the user is not satisfied with previous results. To our best knowledge, this study is one of the first attempts to take the braces2teeth problem into account by using deep learning techniques and it can be applied in various fields, from health care to entertainment.


2018 ◽  
Vol 7 (3.27) ◽  
pp. 258 ◽  
Author(s):  
Yecheng Yao ◽  
Jungho Yi ◽  
Shengjun Zhai ◽  
Yuwen Lin ◽  
Taekseung Kim ◽  
...  

The decentralization of cryptocurrencies has greatly reduced the level of central control over them, impacting international relations and trade. Further, wide fluctuations in cryptocurrency price indicate an urgent need for an accurate way to forecast this price. This paper proposes a novel method to predict cryptocurrency price by considering various factors such as market cap, volume, circulating supply, and maximum supply based on deep learning techniques such as the recurrent neural network (RNN) and the long short-term memory (LSTM),which are effective learning models for training data, with the LSTM being better at recognizing longer-term associations. The proposed approach is implemented in Python and validated for benchmark datasets. The results verify the applicability of the proposed approach for the accurate prediction of cryptocurrency price.


2020 ◽  
Author(s):  
Ghazi Abdalla ◽  
Fatih Özyurt

Abstract In the modern era, Internet usage has become a basic necessity in the lives of people. Nowadays, people can perform online shopping and check the customer’s views about products that purchased online. Social networking services enable users to post opinions on public platforms. Analyzing people’s opinions helps corporations to improve the quality of products and provide better customer service. However, analyzing this content manually is a daunting task. Therefore, we implemented sentiment analysis to make the process automatically. The entire process includes data collection, pre-processing, word embedding, sentiment detection and classification using deep learning techniques. Twitter was chosen as the source of data collection and tweets collected automatically by using Tweepy. In this paper, three deep learning techniques were implemented, which are CNN, Bi-LSTM and CNN-Bi-LSTM. Each of the models trained on three datasets consists of 50K, 100K and 200K tweets. The experimental result revealed that, with the increasing amount of training data size, the performance of the models improved, especially the performance of the Bi-LSTM model. When the model trained on the 200K dataset, it achieved about 3% higher accuracy than the 100K dataset and achieved about 7% higher accuracy than the 50K dataset. Finally, the Bi-LSTM model scored the highest performance in all metrics and achieved an accuracy of 95.35%.


Newspaper articles offer us insights on several news. They can be one of many categories like sports, politics, Science and Technology etc. Text classification is a need of the day as large uncategorized data is the problem everywhere. Through this study, We intend to compare several algorithms along with data preprocessing approaches to classify the newspaper articles into their respective categories. Convolutional Neural Networks(CNN) is a deep learning approach which is currently a strong competitor to other classification algorithms like SVM, Naive Bayes and KNN. We hence intend to implement Convolutional Neural Networks - a deep learning approach to classify our newspaper articles, develop an understanding of all the algorithms implemented and compare their results. We also attempt to compare the training time, prediction time and accuracies of all the algorithms.


Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 157
Author(s):  
Saidrasul Usmankhujaev ◽  
Bunyodbek Ibrokhimov ◽  
Shokhrukh Baydadaev ◽  
Jangwoo Kwon

Deep neural networks (DNN) have proven to be efficient in computer vision and data classification with an increasing number of successful applications. Time series classification (TSC) has been one of the challenging problems in data mining in the last decade, and significant research has been proposed with various solutions, including algorithm-based approaches as well as machine and deep learning approaches. This paper focuses on combining the two well-known deep learning techniques, namely the Inception module and the Fully Convolutional Network. The proposed method proved to be more efficient than the previous state-of-the-art InceptionTime method. We tested our model on the univariate TSC benchmark (the UCR/UEA archive), which includes 85 time-series datasets, and proved that our network outperforms the InceptionTime in terms of the training time and overall accuracy on the UCR archive.


2021 ◽  
Author(s):  
Hanbyeol Kim ◽  
Joongho Lee ◽  
Keunsoo Kang ◽  
Seokhyun Yoon

Abstract Cell type identification is a key step to downstream analysis of single cell RNA-seq experiments. Indispensible information for this is gene expression, which is used to cluster cells, train the model and set rejection thresholds. Problem is they are subject to batch effect arising from different platforms and preprocessing. We present MarkerCount, which uses the number of markers expressed regardless of their expression level to initially identify cell types and, then, reassign cell type in cluster-basis. MarkerCount works both in reference and marker-based mode, where the latter utilizes only the existing lists of markers, while the former required pre-annotated dataset to train the model. The performance was evaluated and compared with the existing identifiers, both marker and reference-based, that can be customized with publicly available datasets and marker DB. The results show that MarkerCount provides a stable performance when comparing with other reference-based and marker-based cell type identifiers.


2021 ◽  
Vol 12 ◽  
Author(s):  
Bin Zou ◽  
Tongda Zhang ◽  
Ruilong Zhou ◽  
Xiaosen Jiang ◽  
Huanming Yang ◽  
...  

It is well recognized that batch effect in single-cell RNA sequencing (scRNA-seq) data remains a big challenge when integrating different datasets. Here, we proposed deepMNN, a novel deep learning-based method to correct batch effect in scRNA-seq data. We first searched mutual nearest neighbor (MNN) pairs across different batches in a principal component analysis (PCA) subspace. Subsequently, a batch correction network was constructed by stacking two residual blocks and further applied for the removal of batch effects. The loss function of deepMNN was defined as the sum of a batch loss and a weighted regularization loss. The batch loss was used to compute the distance between cells in MNN pairs in the PCA subspace, while the regularization loss was to make the output of the network similar to the input. The experiment results showed that deepMNN can successfully remove batch effects across datasets with identical cell types, datasets with non-identical cell types, datasets with multiple batches, and large-scale datasets as well. We compared the performance of deepMNN with state-of-the-art batch correction methods, including the widely used methods of Harmony, Scanorama, and Seurat V4 as well as the recently developed deep learning-based methods of MMD-ResNet and scGen. The results demonstrated that deepMNN achieved a better or comparable performance in terms of both qualitative analysis using uniform manifold approximation and projection (UMAP) plots and quantitative metrics such as batch and cell entropies, ARI F1 score, and ASW F1 score under various scenarios. Additionally, deepMNN allowed for integrating scRNA-seq datasets with multiple batches in one step. Furthermore, deepMNN ran much faster than the other methods for large-scale datasets. These characteristics of deepMNN made it have the potential to be a new choice for large-scale single-cell gene expression data analysis.


The need for offline handwritten character recognition is intense, yet difficult as the writing varies from person to person and also depends on various other factors connected to the attitude and mood of the person. However, we are able to achieve it by converting the handwritten document into digital form. It has been advanced with introducing convolutional neural networks and is further productive with pre-trained models which have the capacity of decreasing the training time and increasing accuracy of character recognition. Research in recognition of handwritten characters for Indian languages is less when compared to other languages like English, Latin, Chinese etc., mainly because it is a multilingual country. Recognition of Telugu and Hindi characters are more difficult as the script of these languages is mostly cursive and are with more diacritics. So the research work in this line is to have inclination towards accuracy in their recognition. Some research has already been started and is successful up to eighty percent in offline hand written character recognition of Telugu and Hindi. The proposed work focuses on increasing accuracy in less time in recognition of these selected languages and is able to reach the expectant values.


Author(s):  
Priti P. Rege ◽  
Shaheera Akhter

Text separation in document image analysis is an important preprocessing step before executing an optical character recognition (OCR) task. It is necessary to improve the accuracy of an OCR system. Traditionally, for separating text from a document, different feature extraction processes have been used that require handcrafting of the features. However, deep learning-based methods are excellent feature extractors that learn features from the training data automatically. Deep learning gives state-of-the-art results on various computer vision, image classification, segmentation, image captioning, object detection, and recognition tasks. This chapter compares various traditional as well as deep-learning techniques and uses a semantic segmentation method for separating text from Devanagari document images using U-Net and ResU-Net models. These models are further fine-tuned for transfer learning to get more precise results. The final results show that deep learning methods give more accurate results compared with conventional methods of image processing for Devanagari text extraction.


2020 ◽  
Author(s):  
Kohulan Rajan ◽  
Achim Zielesny ◽  
Christoph Steinbeck

The automatic recognition of chemical structure diagrams from the literature is an indispensable component of workflows to re-discover information about chemicals and to make it available in open-access databases. Here we report preliminary findings in our development of DECIMER (Deep lEarning for Chemical ImagE Recognition), a deep learning method based on existing show-and-tell deep neural networks which makes very few assumptions about the structure of the underlying problem. The training state reported here does not yet rival the performance of existing traditional approaches, but we present evidence that our method will reach a comparable detection power with sufficient training time. Training success of DECIMER depends on the input data representation: DeepSMILES are clearly superior over SMILES and we have preliminary indication that the recently reported SELFIES outperform DeepSMILES. An extrapolation of our results towards larger training data sizes suggest that we might be able to achieve >90% accuracy with about 60 to 100 million training structures, so that training can be completed within several months on a single GPU. This work is completely based on open-source software and open data and is available to the general public for any purpose.


Sign in / Sign up

Export Citation Format

Share Document