scholarly journals A direct comparison of genome alignment and transcriptome pseudoalignment

2018 ◽  
Author(s):  
Lynn Yi ◽  
Lauren Liu ◽  
Páll Melsted ◽  
Lior Pachter

AbstractMotivationGenome alignment of reads is the first step of most genome analysis workflows. In the case of RNA-Seq, transcriptome pseudoalignment of reads is a fast alternative to genome alignment, but the different “coordinate systems” of the genome and transcriptome have made it difficult to perform direct comparisons between the approaches.ResultsWe have developed tools for converting genome alignments to transcriptome pseudoalignments, and conversely, for projecting transcriptome pseudoalignments to genome alignments. Using these tools, we performed a direct comparison of genome alignment with transcriptome pseudoalignment. We find that both approaches produce similar quantifications. This means that for many applications genome alignment and transcriptome pseudoalignment are interchangeable.Availability and Implementationbam2tcc is a C++14 software for converting alignments in SAM/BAM format to transcript compatibility counts (TCCs) and is available at https://github.com/pachterlab/bam2tcc. kallisto genomebam is a user option of kallisto that outputs a sorted BAM file in genome coordinates as part of transcriptome pseudoalignment. The feature has been released with kallisto v0.44.0, and is available at https://pachterlab.github.io/kallisto/.Supplementary MaterialN/AContactLior Pachter ([email protected])

2021 ◽  
Author(s):  
Michał Stolarczyk ◽  
Bingjie Xue ◽  
Nathan C. Sheffield

Genome analysis relies on reference data like sequences, feature annotations, and aligner indexes. These data can be found in many versions from many sources, making it challenging to identify and assess compatibility among them. For example, how can you determine which indexes are derived from identical raw sequence files, or which annotations share a compatible coordinate system? Here, we describe a novel approach to establish identity and compatibility of reference genome resources. We approach this with three advances: First, we derive unique identifiers for each resource; second, we record parent-child relationships among resources; and third, we describe recursive identifiers that determine identity as well as compatibility of coordinate systems and sequence names. These advances facilitate portability, reproducibility, and re-use of genome reference data.Availabilityhttps://refgenie.databio.org


2020 ◽  
Author(s):  
Vu VH Pham ◽  
Xiaomei Li ◽  
Buu Truong ◽  
Thin Nguyen ◽  
Lin Liu ◽  
...  

AbstractMotivationPredicting cell locations is important since with the understanding of cell locations, we may estimate the function of cells and their integration with the spatial environment. Thus, the DREAM Challenge on Single Cell Transcriptomics required participants to predict the locations of single cells in the Drosophila embryo using single cell transcriptomic data.ResultsWe have developed over 50 pipelines by combining different ways of pre-processing the RNA-seq data, selecting the genes, predicting the cell locations, and validating predicted cell locations, resulting in the winning methods for two out of three sub-challenges in the competition. In this paper, we present an R package, SCTCwhatateam, which includes all the methods we developed and the Shiny web-application to facilitate the research on single cell spatial reconstruction. All the data and the example use cases are available in the Supplementary material.AvailabilityThe scripts of the package are available at https://github.com/thanhbuu04/SCTCwhatateam and the Shiny application is available at https://github.com/pvvhoang/[email protected] informationSupplementary data are available at Briefings in Bioinformatics online.


2006 ◽  
Vol 273 (1605) ◽  
pp. 3133-3133
Author(s):  
Steffen Kiel ◽  
James L. Goedert

Correction for ‘Deep-sea food bonanzas: early Cenozoic whale-fall communities resemble wood-fall rather than seep communities’ by Steffen Kiel and James L. Goedert (Proc. R. Soc. B 273 , 2625–2631. (doi: 10.1098/rspb.2006.3620 )). On page 2626, seven lines before the end of section 2, the complete list of sites and species is available online, but is not published as electronic supplementary material to this paper.


2016 ◽  
Author(s):  
Stephen G. Gaffney ◽  
Jeffrey P. Townsend

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.


2019 ◽  
Author(s):  
Gamze Gürsoy ◽  
Charlotte M. Brannon ◽  
Fabio C.P. Navarro ◽  
Mark Gerstein

AbstractFunctional genomics data is becoming clinically actionable, raising privacy concerns. However, quantifying the privacy leakage by genotyping is difficult due to the heterogeneous nature of sequencing techniques. Thus, we present FANCY, a tool that rapidly estimates number of leaking variants from raw RNA-Seq, ATAC-Seq and ChIP-Seq reads, without explicit genotyping. FANCY employs supervised regression using overall sequencing statistics as features and provides an estimate of the overall privacy risk before data release. FANCY can predict the cumulative number of leaking SNVs with a 0.95 average R2 for all independent test sets. We acknowledged the importance of accurate prediction even when the number of leaked variants is low, so we developed a special version of model, which can make predictions with higher accuracy for only a few leaking variants. A python and MATLAB implementation of FANCY, as well as custom scripts to generate the features can be found at https://github.com/gersteinlab/FANCY. We also provide jupyter notebooks so that users can optimize the parameters in the regression model based on their own data. An easy-to-use webserver that takes inputs and displays results can be found at fancy.gersteinlab.org.


F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 54 ◽  
Author(s):  
Anil S. Thanki ◽  
Shabhonam Caim ◽  
Manuel Corpas ◽  
Robert P. Davey

Summary: Compositional GC/AT content of DNA sequences is a useful feature in genome analysis. GC/AT content provides useful information about evolution, structure and function of genomes, giving clues about their biological function and organisation. We have developed DNAContentViewer, a BioJS component for visualisation of compositional GC/AT content in raw sequences. DNAContentViewer has been integrated in the BioJS project as part of the BioJS registry of components. DNAContentViewer requires a simple configuration and installation. Its design allows potential interactions with other components via predefined events. Availability: http://github.com/biojs/biojs; doi: 10.5281/zenodo.7722.


2019 ◽  
Vol 56 (1) ◽  
pp. 30-46 ◽  
Author(s):  
Ashley M. Abrook ◽  
Ian P. Matthews ◽  
Alice M. Milner ◽  
Ian Candy ◽  
Adrian P. Palmer ◽  
...  

The Last Glacial–Interglacial Transition (LGIT) is a period of climatic complexity where millennial-scale climatic reorganization led to changes in ecosystems. Alongside millennial-scale changes, centennial-scale climatic events have been observed within records from Greenland and continental Europe. The effects of these abrupt events on landscapes and environments are difficult to discern at present. This, in part, relates to low temporal resolutions attained by many studies and the sensitivity of palaeoenvironmental proxies to abrupt change. We present a high-resolution palynological and charcoal study of Quoyloo Meadow, Orkney and use the Principal Curve statistical method to assist in revealing biostratigraphic change. The LGIT vegetation succession on Orkney is presented as open grassland and Empetrum heath during the Windermere Interstadial and early Holocene, and open grassland with Artemisia during the Loch Lomond Stadial. However, a further three phases of ecological change, characterized by expansions of open ground flora, are dated to 14.05–13.63, 10.94–10.8 and 10.2 cal ka BP. The timing of these changes is constrained by cryptotephra of known age. The paper concludes by comparing Quoyloo Meadow with Crudale Meadow, Orkney, and suggests that both Windermere Interstadial records are incomplete and that fire is an important landscape control during the early Holocene.Supplementary material: All raw data associated with this publication: raw pollen counts, charcoal data, Principal Curve and Rate of Change outputs and the age-model output are available at https://doi.org/10.6084/m9.figshare.c.4725269Thematic collection: This article is part of the ‘Early Career Research’ available at: https://www.lyellcollection.org/cc/SJG-early-career-research


2021 ◽  
pp. jgs2021-037
Author(s):  
Michael J. Benton ◽  
Andrey G. Sennikov

The naming of the Permian by Roderick Murchison in 1841 is well known. This is partly because he ‘completed’ the stratigraphic column at system level, but also because of the exotic aspects of his extended fieldwork in remote parts of Russia and Murchison's reputed character. Here, we explore several debated and controversial aspects of this act, benefiting from access to documents and reports notably from Russian sources. Murchison or Sedgwick could have provided a name for the unnamed lower New Red Sandstone in 1835 based on British successions or those in Germany, so perhaps the Imperial aim of naming time from British geology was not the urgent task some have assumed. Murchison has been painted as arrogant and Imperialistic, which was doubtless true, but at the time many saw him as a great leader, even an attractive individual. Others suggest he succeeded because he stood on the shoulders of local geologists; however, his abilities at brilliant and rapid geological synthesis are undoubted. Two unexpected consequences of his work are that this arch conservative is revered in Russia as a hero of geological endeavours, and, for all his bombast, his ‘Permian’ was not widely accepted until 100 years after its naming.Supplementary material:https://doi.org/10.6084/m9.figshare.c.5412079


2017 ◽  
Author(s):  
Christopher Wilks ◽  
Phani Gaddipati ◽  
Abhinav Nellore ◽  
Ben Langmead

AbstractAs more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. Snaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70,000 human RNA-seq samples. Queries can be tailored by constraining which junctions and samples to consider. Snaptron can also rank and score junctions according to tissue specificity or other criteria. Further, Snaptron can rank and score samples according to the relative frequency of different splicing patterns. We outline biological questions that can be explored with Snaptron queries, including a study of novel exons in annotated genes, of exonization of repetitive element loci, and of a recently discovered alternative transcription start site for the ALK gene. Web app and documentation are at http://snaptron.cs.jhu.edu. Source code is at https://github.com/ChristopherWilks/snaptron under the MIT license.


2020 ◽  
Author(s):  
Ruben Chazarra-Gil ◽  
Stijn van Dongen ◽  
Vladimir Yu Kiselev ◽  
Martin Hemberg

AbstractAs the cost of single-cell RNA-seq experiments has decreased, an increasing number of datasets are now available. Combining newly generated and publicly accessible datasets is challenging due to non-biological signals, commonly known as batch effects. Although there are several computational methods available that can remove batch effects, evaluating which method performs best is not straightforward. Here we present BatchBench (https://github.com/cellgeni/batchbench), a modular and flexible pipeline for comparing batch correction methods for single-cell RNA-seq data. We apply BatchBench to eight methods, highlighting their methodological differences and assess their performance and computational requirements through a compendium of well-studied datasets. This systematic comparison guides users in the choice of batch correction tool, and the pipeline makes it easy to evaluate other datasets.


Sign in / Sign up

Export Citation Format

Share Document