PanSVR: Pan-Genome Augmented Short Read Realignment for Sensitive Detection of Structural Variations

The comprehensive discovery of structure variations (SVs) is fundamental to many genomics studies and high-throughput sequencing has become a common approach to this task. However, due the limited length, it is still non-trivial to state-of-the-art tools to accurately align short reads and produce high-quality SV callsets. Pan-genome provides a novel and promising framework to short read-based SV calling since it enables to comprehensively integrate known variants to reduce the incompleteness and bias of single reference to breakthrough the bottlenecks of short read alignments and provide new evidences to the detection of SVs. However, it is still an open problem to develop effective computational approaches to fully take the advantage of pan-genomes. Herein, we propose Pan-genome augmented Structure Variation calling tool with read Re-alignment (PanSVR), a novel pan-genome-based SV calling approach. PanSVR uses several tailored methods to implement precise re-alignment for SV-spanning reads against well-organized pan-genome reference with plenty of known SVs. PanSVR enables to greatly improve the quality of short read alignments and produce clear and homogenous SV signatures which facilitate SV calling. Benchmark results on real sequencing data suggest that PanSVR is able to largely improve the sensitivity of SV calling than that of state-of-the-art SV callers, especially for the SVs from repeat-rich regions and/or novel insertions which are difficult to existing tools.

Download Full-text

Quality Assessment of Domesticated Animal Genome Assemblies

Bioinformatics and Biology Insights ◽

10.4137/bbi.s29333 ◽

2015 ◽

Vol 9S4 ◽

pp. BBI.S29333 ◽

Cited By ~ 3

Author(s):

Stefan E. Seemann ◽

Christian Anthon ◽

Oana Palasca ◽

Jan Gorodkin

Keyword(s):

High Throughput Sequencing ◽

Genomic Sequence ◽

Rna Seq ◽

Sequencing Data ◽

Assembly Quality ◽

High Quality ◽

Rnaseq Data ◽

Genome Assemblies ◽

Animal Genomes

The era of high-throughput sequencing has made it relatively simple to sequence genomes and transcriptomes of individuals from many species. In order to analyze the resulting sequencing data, high-quality reference genome assemblies are required. However, this is still a major challenge, and many domesticated animal genomes still need to be sequenced deeper in order to produce high-quality assemblies. In the meanwhile, ironically, the extent to which RNA seq and other next-generation data is produced frequently far exceeds that of the genomic sequence. Furthermore, basic comparative analysis is often affected by the lack of genomic sequence. Herein, we quantify the quality of the genome assemblies of 20 domesticated animals and related species by assessing a range of measurable parameters, and we show that there is a positive correlation between the fraction of mappable reads from RNAseq data and genome assembly quality. We rank the genomes by their assembly quality and discuss the implications for genotype analyses.

Download Full-text

UBAR

ACM Transactions on Embedded Computing Systems ◽

10.1145/3441644 ◽

2021 ◽

Vol 20 (3) ◽

pp. 1-25

Author(s):

Elham Shamsa ◽

Alma Pröbstl ◽

Nima TaheriNejad ◽

Anil Kanduri ◽

Samarjit Chakraborty ◽

...

Keyword(s):

Resource Management ◽

Quality Of Experience ◽

State Of The Art ◽

State Of Charge ◽

User Preference ◽

Management Approach ◽

High Quality ◽

Trade Off ◽

Run Time

Smartphone users require high Battery Cycle Life (BCL) and high Quality of Experience (QoE) during their usage. These two objectives can be conflicting based on the user preference at run-time. Finding the best trade-off between QoE and BCL requires an intelligent resource management approach that considers and learns user preference at run-time. Current approaches focus on one of these two objectives and neglect the other, limiting their efficiency in meeting users’ needs. In this article, we present UBAR, User- and Battery-aware Resource management, which considers dynamic workload, user preference, and user plug-in/out pattern at run-time to provide a suitable trade-off between BCL and QoE. UBAR personalizes this trade-off by learning the user’s habits and using that to satisfy QoE, while considering battery temperature and State of Charge (SOC) pattern to maximize BCL. The evaluation results show that UBAR achieves 10% to 40% improvement compared to the existing state-of-the-art approaches.

Download Full-text

Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding

MycoKeys ◽

10.3897/mycokeys.39.28109 ◽

2018 ◽

Vol 39 ◽

pp. 29-40 ◽

Cited By ~ 21

Author(s):

Sten Anslan ◽

R. Henrik Nilsson ◽

Christian Wurzbacher ◽

Petr Baldrian ◽

Leho Tedersoo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Potential Effect ◽

Data Sets ◽

Sequencing Data ◽

Operational Taxonomic Units ◽

High Throughput Sequencing Data ◽

Recent Developments

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.

Download Full-text

Thermal Laser Separation – A Novel Dicing Technology Fulfilling the Demands of Volume Manufacturing of 4H-SiC Devices

Materials Science Forum ◽

10.4028/www.scientific.net/msf.821-823.528 ◽

2015 ◽

Vol 821-823 ◽

pp. 528-532 ◽

Cited By ~ 5

Author(s):

Dirk Lewke ◽

Karl Otto Dohnke ◽

Hans Ulrich Zühlke ◽

Mercedes Cerezuela Barret ◽

Martin Schellenberger ◽

...

Keyword(s):

Experimental Data ◽

Process Control ◽

Tool Wear ◽

Large Scale ◽

State Of The Art ◽

Scale Production ◽

High Quality ◽

Large Scale Production ◽

Laser Separation

One challenge for volume manufacturing of 4H-SiC devices is the state-of-the-art wafer dicing technology – the mechanical blade dicing which suffers from high tool wear and low feed rates. In this paper we discuss Thermal Laser Separation (TLS) as a novel dicing technology for large scale production of SiC devices. We compare the latest TLS experimental data resulting from fully processed 4H-SiC wafers with results obtained by mechanical dicing technology. Especially typical product relevant features like process control monitoring (PCM) structures and backside metallization, quality of diced SiC-devices as well as productivity are considered. It could be shown that with feed rates up to two orders of magnitude higher than state-of-the-art, no tool wear and high quality of diced chips, TLS has a very promising potential to fulfill the demands of volume manufacturing of 4H-SiC devices.

Download Full-text

Facile, High Quality Sequencing of Bacterial Genomes from Small Amounts of DNA

International Journal of Genomics ◽

10.1155/2014/434575 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8

Author(s):

Momchilo Vuyisich ◽

Ayesha Arefin ◽

Karen Davenport ◽

Shihai Feng ◽

Cheryl Gleasner ◽

...

Keyword(s):

Genomic Dna ◽

De Novo ◽

Gc Content ◽

Library Preparation ◽

Sequencing Data ◽

Bacterial Genomes ◽

Dna Amount ◽

High Quality ◽

Preparation Methods

Sequencing bacterial genomes has traditionally required large amounts of genomic DNA (~1 μg). There have been few studies to determine the effects of the input DNA amount or library preparation method on the quality of sequencing data. Several new commercially available library preparation methods enable shotgun sequencing from as little as 1 ng of input DNA. In this study, we evaluated the NEBNext Ultra library preparation reagents for sequencing bacterial genomes. We have evaluated the utility of NEBNext Ultra for resequencing andde novoassembly of four bacterial genomes and compared its performance with the TruSeq library preparation kit. The NEBNext Ultra reagents enable high quality resequencing andde novoassembly of a variety of bacterial genomes when using 100 ng of input genomic DNA. For the two most challenging genomes (Burkholderiaspp.), which have the highest GC content and are the longest, we also show that the quality of both resequencing andde novoassembly is not decreased when only 10 ng of input genomic DNA is used.

Download Full-text

Rapid Mycobacterium tuberculosis spoligotyping from uncorrected long reads using Galru

10.1101/2020.05.31.126490 ◽

2020 ◽

Author(s):

Andrew J. Page ◽

Nabil-Fareed Alikhan ◽

Michael Strinden ◽

Thanh Le Viet ◽

Timofey Skvortsov

Keyword(s):

Mycobacterium Tuberculosis ◽

State Of The Art ◽

Sequence Data ◽

Human Pathogen ◽

Sequencing Data ◽

Short Read ◽

Short Read Sequencing ◽

Long Reads ◽

Long Read

AbstractSpoligotyping of Mycobacterium tuberculosis provides a subspecies classification of this major human pathogen. Spoligotypes can be predicted from short read genome sequencing data; however, no methods exist for long read sequence data such as from Nanopore or PacBio. We present a novel software package Galru, which can rapidly detect the spoligotype of a Mycobacterium tuberculosis sample from as little as a single uncorrected long read. It allows for near real-time spoligotyping from long read data as it is being sequenced, giving rapid sample typing. We compare it to the existing state of the art software and find it performs identically to the results obtained from short read sequencing data. Galru is freely available from https://github.com/quadram-institute-bioscience/galru under the GPLv3 open source licence.

Download Full-text

Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016722 ◽

2019 ◽

Vol 33 ◽

pp. 6722-6729 ◽

Cited By ~ 4

Author(s):

Ziming Li ◽

Julia Kiseleva ◽

Maarten De Rijke

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Imitation Learning ◽

Local Optimum ◽

Inverse Reinforcement Learning ◽

High Quality ◽

Overall Performance

The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator. The reward signal from a poor discriminator can be very sparse and unstable, which may lead the generator to fall into a local optimum or to produce nonsense replies. To alleviate the first problem, we first extend a recently proposed adversarial dialogue generation method to an adversarial imitation learning solution. Then, in the framework of adversarial inverse reinforcement learning, we propose a new reward model for dialogue generation that can provide a more accurate and precise reward signal for generator training. We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.

Download Full-text

A benchmarking of human mitochondrial DNA haplogroup classifiers from whole-genome and whole-exome sequence data

10.1101/2021.02.11.430775 ◽

2021 ◽

Author(s):

Víctor García-Olivares ◽

Adrián Muñoz-Barrera ◽

José Miguel Lorenzo-Salazar ◽

Carlos Zaragoza-Trello ◽

Luis A. Rubio-Rodríguez ◽

...

Keyword(s):

High Throughput Sequencing ◽

De Novo ◽

Sequence Data ◽

Qualitative Assessment ◽

Whole Genome ◽

Third Generation ◽

Sequencing Data ◽

Short Read ◽

Bioinformatic Tools ◽

Whole Exome

AbstractThe mitochondrial genome (mtDNA) is of interest for a range of fields including evolutionary, forensic, and medical genetics. Human mitogenomes can be classified into evolutionary related haplogroups that provide ancestral information and pedigree relationships. Because of this and the advent of high-throughput sequencing (HTS) technology, there is a diversity of bioinformatic tools for haplogroup classification. We present a benchmarking of the 11 most salient tools for human mtDNA classification using empirical whole-genome (WGS) and whole-exome (WES) short-read sequencing data from 36 unrelated donors. Besides, because of its relevance, we also assess the best performing tool in third-generation long noisy read WGS data obtained with nanopore technology for a subset of the donors. We found that, for short-read WGS, most of the tools exhibit high accuracy for haplogroup classification irrespective of the input file used for the analysis. However, for short-read WES, Haplocheck and MixEmt were the most accurate tools. Based on the performance shown for WGS and WES, and the accompanying qualitative assessment, Haplocheck stands out as the most complete tool. For third-generation HTS data, we also showed that Haplocheck was able to accurately retrieve mtDNA haplogroups for all samples assessed, although only after following assembly-based approaches (either based on a referenced-based assembly or a hybrid de novo assembly). Taken together, our results provide guidance for researchers to select the most suitable tool to conduct the mtDNA analyses from HTS data.

Download Full-text

Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00063 ◽

2017 ◽

Vol 5 ◽

pp. 309-324 ◽

Cited By ~ 11

Author(s):

Nikola Mrkšić ◽

Ivan Vulić ◽

Diarmuid Ó Séaghdha ◽

Ira Leviant ◽

Roi Reichart ◽

...

Keyword(s):

State Of The Art ◽

Vector Spaces ◽

Lexical Resources ◽

High Quality ◽

Performance Improvements ◽

State Tracking ◽

Cross Lingual ◽

Semantic Transfer ◽

Multiple Languages

We present Attract-Repel, an algorithm for improving the semantic quality of word vectors by injecting constraints extracted from lexical resources. Attract-Repel facilitates the use of constraints from mono- and cross-lingual resources, yielding semantically specialized cross-lingual vector spaces. Our evaluation shows that the method can make use of existing cross-lingual lexicons to construct high-quality vector spaces for a plethora of different languages, facilitating semantic transfer from high- to lower-resource ones. The effectiveness of our approach is demonstrated with state-of-the-art results on semantic similarity datasets in six languages. We next show that Attract-Repel-specialized vectors boost performance in the downstream task of dialogue state tracking (DST) across multiple languages. Finally, we show that cross-lingual vector spaces produced by our algorithm facilitate the training of multilingual DST models, which brings further performance improvements.

Download Full-text

Online Social Exergames for Seniors

Data Analytics in Medicine ◽

10.4018/978-1-7998-1204-3.ch080 ◽

2020 ◽

pp. 1599-1631

Author(s):

Stathis Th. Konstantinidis ◽

Ellen Brox ◽

Per Egil Kummervold ◽

Josef Hallberg ◽

Gunn Evertsen ◽

...

Keyword(s):

Quality Of Life ◽

Physical Activity ◽

Physical Training ◽

State Of The Art ◽

Social Contact ◽

High Quality ◽

Current State ◽

The Future ◽

The Impact

The population is getting older, and the resources for care will be even more limited in the future than they are now. There is thus an aim for the society that the seniors can manage themselves as long as possible, while at the same time keeping a high quality of life. Physical activity is important to stay fit, and social contact is important for the quality of life. The aim of this chapter is to provide a state-of-the-art of online social exergames for seniors, providing glimpses of senior users' opinions and games limitations. The importance of the motivational techniques is emphasized, as well as the impact that the exergames have to seniors. It contributes to the book objectives focusing on current state and practice in health games for physical training and rehabilitation and the use of gamification, exploring future opportunities and uses of gamification in eHealth and discussing the respective challenges and limitations.

Download Full-text