scholarly journals Bias invariant RNA-seq metadata annotation

2020 ◽  
Author(s):  
Hannes Wartmann ◽  
Sven Heins ◽  
Karin Kloiber ◽  
Stefan Bonn

AbstractRecent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Here we investigate RNA-seq metadata prediction based on gene expression values. We present a deep-learning based domain adaptation algorithm for the automatic annotation of RNA-seq metadata. We show how our algorithm outperforms existing approaches as well as traditional deep learning methods for the prediction of tissue, sample source, and patient sex information across several large data repositories. By using a model architecture similar to siamese networks the algorithm is able to learn biases from datasets with few samples. Our domain adaptation approach achieves metadata annotation accuracies up to 12.3% better than a previously published method. Lastly, we provide a list of more than 10,000 novel tissue and sex label annotations for 8,495 unique SRA samples.

GigaScience ◽  
2021 ◽  
Vol 10 (9) ◽  
Author(s):  
Hannes Wartmann ◽  
Sven Heins ◽  
Karin Kloiber ◽  
Stefan Bonn

Abstract Background Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs. Findings Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning–based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression–based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples. Conclusion Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable.


2020 ◽  
Author(s):  
Turki Turki ◽  
Y-h. Taguchi

AbstractAnalyzing single-cell pancreatic data would play an important role in understanding various metabolic diseases and health conditions. Due to the sparsity and noise present in such single-cell gene expression data, analyzing various functions related to the inference of gene regulatory networks, derived from single-cell data, remains difficult, thereby posing a barrier to the deepening of understanding of cellular metabolism. Since recent studies have led to the reliable inference of single-cell gene regulatory networks (SCGRNs), the challenge of discriminating between SCGRNs has now arisen. By accurately discriminating between SCGRNs (e.g., distinguishing SCGRNs of healthy pancreas from those of T2D pancreas), biologists would be able to annotate, organize, visualize, and identify common patterns of SCGRNs for metabolic diseases. Such annotated SCGRNs could play an important role in speeding up the process of building large data repositories. In this study, we aimed to contribute to the development of a novel deep learning (DL) application. First, we generated a dataset consisting of 224 SCGRNs belonging to both T2D and healthy pancreas and made it freely available. Next, we chose seven DL architectures, including VGG16, VGG19, Xception, ResNet50, ResNet101, DenseNet121, and DenseNet169, trained each of them on the dataset, and checked prediction based on a test set. We evaluated the DL architectures on an HP workstation platform with a single NVIDIA GeForce RTX 2080Ti GPU. Experimental results on the whole dataset, using several performance measures, demonstrated the superiority of VGG19 DL model in the automatic classification of SCGRNs, derived from the single-cell pancreatic data.


2016 ◽  
Author(s):  
Fritz Lekschas ◽  
Nils Gehlenborg

AbstractThe ever-increasing number of biomedical data sets provides tremendous opportunities for re-use but current data repositories provide limited means of exploration apart from text-based search. Ontological metadata annotations provide context by semantically relating data sets. Visualizing this rich network of relationships can improve the explorability of large data repositories and help researchers find data sets of interest. We developed SATORI—an integrative search and visual exploration interface for the exploration of biomedical data repositories. The design is informed by a requirements analysis through a series of semi-structured interviews. We evaluated the implementation of SATORI in a field study on a real-world data collection.SATORI enables researchers to seamlessly search, browse, and semantically query data repositories via two visualizations that are highly interconnected with a powerful search interface. SATORI is an open-source web application,which is freely available at http://satori.refinery-platform.org and integrated into the Refinery Platform.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Malte Seemann ◽  
Lennart Bargsten ◽  
Alexander Schlaefer

AbstractDeep learning methods produce promising results when applied to a wide range of medical imaging tasks, including segmentation of artery lumen in computed tomography angiography (CTA) data. However, to perform sufficiently, neural networks have to be trained on large amounts of high quality annotated data. In the realm of medical imaging, annotations are not only quite scarce but also often not entirely reliable. To tackle both challenges, we developed a two-step approach for generating realistic synthetic CTA data for the purpose of data augmentation. In the first step moderately realistic images are generated in a purely numerical fashion. In the second step these images are improved by applying neural domain adaptation. We evaluated the impact of synthetic data on lumen segmentation via convolutional neural networks (CNNs) by comparing resulting performances. Improvements of up to 5% in terms of Dice coefficient and 20% for Hausdorff distance represent a proof of concept that the proposed augmentation procedure can be used to enhance deep learning-based segmentation for artery lumen in CTA images.


2021 ◽  
Vol 2021 (1) ◽  
Author(s):  
Clara Borrelli ◽  
Paolo Bestagini ◽  
Fabio Antonacci ◽  
Augusto Sarti ◽  
Stefano Tubaro

AbstractSeveral methods for synthetic audio speech generation have been developed in the literature through the years. With the great technological advances brought by deep learning, many novel synthetic speech techniques achieving incredible realistic results have been recently proposed. As these methods generate convincing fake human voices, they can be used in a malicious way to negatively impact on today’s society (e.g., people impersonation, fake news spreading, opinion formation). For this reason, the ability of detecting whether a speech recording is synthetic or pristine is becoming an urgent necessity. In this work, we develop a synthetic speech detector. This takes as input an audio recording, extracts a series of hand-crafted features motivated by the speech-processing literature, and classify them in either closed-set or open-set. The proposed detector is validated on a publicly available dataset consisting of 17 synthetic speech generation algorithms ranging from old fashioned vocoders to modern deep learning solutions. Results show that the proposed method outperforms recently proposed detectors in the forensics literature.


Author(s):  
Ansh Kapil ◽  
Armin Meier ◽  
Keith Steele ◽  
Marlon Rebelatto ◽  
Katharina Nekolla ◽  
...  

Author(s):  
Natalie Gentner ◽  
Andreas Kyek ◽  
Yao Yang ◽  
Mattia Carletti ◽  
Gian Antonio Susto

2017 ◽  
Vol 12 (7) ◽  
pp. 851-855 ◽  
Author(s):  
Louis Passfield ◽  
James G. Hopker

This paper explores the notion that the availability and analysis of large data sets have the capacity to improve practice and change the nature of science in the sport and exercise setting. The increasing use of data and information technology in sport is giving rise to this change. Web sites hold large data repositories, and the development of wearable technology, mobile phone applications, and related instruments for monitoring physical activity, training, and competition provide large data sets of extensive and detailed measurements. Innovative approaches conceived to more fully exploit these large data sets could provide a basis for more objective evaluation of coaching strategies and new approaches to how science is conducted. An emerging discipline, sports analytics, could help overcome some of the challenges involved in obtaining knowledge and wisdom from these large data sets. Examples of where large data sets have been analyzed, to evaluate the career development of elite cyclists and to characterize and optimize the training load of well-trained runners, are discussed. Careful verification of large data sets is time consuming and imperative before useful conclusions can be drawn. Consequently, it is recommended that prospective studies be preferred over retrospective analyses of data. It is concluded that rigorous analysis of large data sets could enhance our knowledge in the sport and exercise sciences, inform competitive strategies, and allow innovative new research and findings.


Author(s):  
Varsha R ◽  
Meghna Manoj Nair ◽  
Siddharth M. Nair ◽  
Amit Kumar Tyagi

The Internet of Things (smart things) is used in many sectors and applications due to recent technological advances. One of such application is in the transportation system, which is of primary use for the users to move from one place to another place. The smart devices which were embedded in vehicles are useful for the passengers to solve his/her query, wherein future vehicles will be fully automated to the advanced stage, i.e. future cars with driverless feature. These autonomous cars will help people a lot to reduce their time and increases their productivity in their respective (associated) business. In today’s generation and in the near future, privacy preserving and trust will be a major concern among users and autonomous vehicles and hence, this paper will be able to provide clarity for the same. Many attempts in previous decade have provided many efficient mechanisms, but they all work only with vehicles along with a driver. However, these mechanisms are not valid and useful for future vehicles. In this paper, we will use deep learning techniques for building trust using recommender systems and Blockchain technology for privacy preserving. We also maintain a certain level of trust via maintaining the highest level of privacy among users living in a particular environment. In this research, we developed a framework that could offer maximum trust or reliable communication to users over the road network. With this, we also preserve privacy of users during traveling, i.e., without revealing identity of respective users from Trusted Third Parties or even Location Based Service in reaching a destination. Thus, Deep Learning based Blockchain Solution (DLBS) is illustrated for providing an efficient recommendation system.


Sign in / Sign up

Export Citation Format

Share Document