scholarly journals treedata.table: a wrapper for data.table that enables fast manipulation of large phylogenetic trees matched to data

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12450
Author(s):  
Cristian Román Palacios ◽  
April Wright ◽  
Josef Uyeda

The number of terminals in phylogenetic trees has significantly increased over the last decade. This trend reflects recent advances in next-generation sequencing, accessibility of public data repositories, and the increased use of phylogenies in many fields. Despite R being central to the analysis of phylogenetic data, manipulation of phylogenetic comparative datasets remains slow, complex, and poorly reproducible. Here, we describe the first R package extending the functionality and syntax of data.table to explicitly deal with phylogenetic comparative datasets. treedata.table significantly increases speed and reproducibility during the data manipulation steps involved in the phylogenetic comparative workflow in R. The latest release of treedata.table is currently available through CRAN (https://cran.r-project.org/web/packages/treedata.table/). Additional documentation can be accessed through rOpenSci (https://ropensci.github.io/treedata.table/).

2018 ◽  
Vol 2 (5) ◽  
pp. 295-300
Author(s):  
Joan E. Adamo ◽  
Robert V. Bienvenu ◽  
F. Owen Fields ◽  
Soma Ghosh ◽  
Christina M. Jones ◽  
...  

Building on the recent advances in next-generation sequencing, the integration of genomics, proteomics, metabolomics, and other approaches hold tremendous promise for precision medicine. The approval and adoption of these rapidly advancing technologies and methods presents several regulatory science considerations that need to be addressed. To better understand and address these regulatory science issues, a Clinical and Translational Science Award Working Group convened the Regulatory Science to Advance Precision Medicine Forum. The Forum identified an initial set of regulatory science gaps. The final set of key findings and recommendations provided here address issues related to the lack of standardization of complex tests, preclinical issues, establishing clinical validity and utility, pharmacogenomics considerations, and knowledge gaps.


2021 ◽  
Author(s):  
Marcia Gumiel ◽  
Oscar M Rollano-Penaloza ◽  
Carmelo Peralta-Rivero ◽  
Leslie Tejeda ◽  
Valeria D. Palma Encinas ◽  
...  

We report the complete chloroplast sequences of two varieties of Theobroma cacao collected in the Bolivian Amazonia using Next-Generation Sequencing. Comparisons made between these two chloroplast genomes and the Belizean reference plastid genome identified 19 and 22 nucleotide variants. The phylogenetic analysis reported three main T. cacao clades belonging to the Forastero, Criollo and Trinitario groups. The Bolivian Native Cacao varieties were located inside the Trinitario group forming their unique branch. The Bolivian Native Cacao branch reveals a possible new subpopulation different from the well-characterized T. cacao subpopulations. The phylogenetic trees showed that the relationships among the T. cacao varieties were consistent with their geographical locations placing the Cacao Center of Origin in Western Amazon. The data presented here will contribute to the usage of ultrabarcoding to distinguish different T. cacao varieties and to identify native cacaos from introduced cacaos. Thus helping in the conservation of local native varieties of T. cacao.


Viruses ◽  
2019 ◽  
Vol 11 (8) ◽  
pp. 701 ◽  
Author(s):  
Kumar ◽  
Chaudhary ◽  
Lu ◽  
Duff ◽  
Heffel ◽  
...  

Viruses belonging to the genus Bocaparvovirus (BoV) are a genetically diverse group of DNA viruses known to cause respiratory, enteric, and neurological diseases in animals, including humans. An intestinal sample from an alpaca (Vicugna pacos) herd with reoccurring diarrhea and respiratory disease was submitted for next-generation sequencing, revealing the presence of a BoV strain. The alpaca BoV strain (AlBoV) had a 58.58% whole genome nucleotide percent identity to a camel BoV from Dubai, belonging to a tentative ungulate BoV 8 species (UBoV8). Recombination events were lacking with other UBoV strains. The AlBoV genome was comprised of the NS1, NP1, and VP1 proteins. The NS1 protein had the highest amino acid percent identity range (57.89–67.85%) to the members of UBoV8, which was below the 85% cut-off set by the International Committee on Taxonomy of Viruses. The low NS1 amino acid identity suggests that AlBoV is a tentative new species. The whole genome, NS1, NP1, and VP1 phylogenetic trees illustrated distinct branching of AlBoV, sharing a common ancestor with UBoV8. Walker loop and Phospholipase A2 (PLA2) motifs that are vital for virus infectivity were identified in NS1 and VP1 proteins, respectively. Our study reports a novel BoV strain in an alpaca intestinal sample and highlights the need for additional BoV research.


Tumor Biology ◽  
2017 ◽  
Vol 39 (5) ◽  
pp. 101042831769837 ◽  
Author(s):  
Padmanaban S Suresh ◽  
Thejaswini Venkatesh ◽  
Rie Tsutsumi ◽  
Abhishek Shetty

Contemporary molecular biology research tools have enriched numerous areas of biomedical research that address challenging diseases, including endocrine cancers (pituitary, thyroid, parathyroid, adrenal, testicular, ovarian, and neuroendocrine cancers). These tools have placed several intriguing clues before the scientific community. Endocrine cancers pose a major challenge in health care and research despite considerable attempts by researchers to understand their etiology. Microarray analyses have provided gene signatures from many cells, tissues, and organs that can differentiate healthy states from diseased ones, and even show patterns that correlate with stages of a disease. Microarray data can also elucidate the responses of endocrine tumors to therapeutic treatments. The rapid progress in next-generation sequencing methods has overcome many of the initial challenges of these technologies, and their advantages over microarray techniques have enabled them to emerge as valuable aids for clinical research applications (prognosis, identification of drug targets, etc.). A comprehensive review describing the recent advances in next-generation sequencing methods and their application in the evaluation of endocrine and endocrine-related cancers is lacking. The main purpose of this review is to illustrate the concepts that collectively constitute our current view of the possibilities offered by next-generation sequencing technological platforms, challenges to relevant applications, and perspectives on the future of clinical genetic testing of patients with endocrine tumors. We focus on recent discoveries in the use of next-generation sequencing methods for clinical diagnosis of endocrine tumors in patients and conclude with a discussion on persisting challenges and future objectives.


2018 ◽  
Vol 3 ◽  
pp. 36 ◽  
Author(s):  
Márton Münz ◽  
Shazia Mahamdallie ◽  
Shawn Yost ◽  
Andrew Rimmer ◽  
Emma Poyastro-Pearson ◽  
...  

Quality assurance and quality control are essential for robust next generation sequencing (NGS). Here we present CoverView, a fast, flexible, user-friendly quality evaluation tool for NGS data. CoverView processes mapped sequencing reads and user-specified regions to report depth of coverage, base and mapping quality metrics with increasing levels of detail from a chromosome-level summary to per-base profiles. CoverView can flag regions that do not fulfil user-specified quality requirements, allowing suboptimal data to be systematically and automatically presented for review. It also provides an interactive graphical user interface (GUI) that can be opened in a web browser and allows intuitive exploration of results. We have integrated CoverView into our accredited clinical cancer predisposition gene testing laboratory that uses the TruSight Cancer Panel (TSCP). CoverView has been invaluable for optimisation and quality control of our testing pipeline, providing transparent, consistent quality metric information and automatic flagging of regions that fall below quality thresholds. We demonstrate this utility with TSCP data from the Genome in a Bottle reference sample, which CoverView analysed in 13 seconds. CoverView uses data routinely generated by NGS pipelines, reads standard input formats, and rapidly creates easy-to-parse output text (.txt) files that are customised by a simple configuration file. CoverView can therefore be easily integrated into any NGS pipeline. CoverView and detailed documentation for its use are freely available at github.com/RahmanTeamDevelopment/CoverView/releases and www.icr.ac.uk/CoverView


2019 ◽  
Vol 10 (7) ◽  
pp. 376-395 ◽  
Author(s):  
Yulia A Nasykhova ◽  
Yury A Barbitoff ◽  
Elena A Serebryakova ◽  
Dmitry S Katserov ◽  
Andrey S Glotov

Author(s):  
Alba Gutiérrez-Sacristán ◽  
Carlos De Niz ◽  
Cartik Kothari ◽  
Sek Won Kong ◽  
Kenneth D Mandl ◽  
...  

Abstract Precision medicine promises to revolutionize treatment, shifting therapeutic approaches from the classical one-size-fits-all to those more tailored to the patient’s individual genomic profile, lifestyle and environmental exposures. Yet, to advance precision medicine’s main objective—ensuring the optimum diagnosis, treatment and prognosis for each individual—investigators need access to large-scale clinical and genomic data repositories. Despite the vast proliferation of these datasets, locating and obtaining access to many remains a challenge. We sought to provide an overview of available patient-level datasets that contain both genotypic data, obtained by next-generation sequencing, and phenotypic data—and to create a dynamic, online catalog for consultation, contribution and revision by the research community. Datasets included in this review conform to six specific inclusion parameters that are: (i) contain data from more than 500 human subjects; (ii) contain both genotypic and phenotypic data from the same subjects; (iii) include whole genome sequencing or whole exome sequencing data; (iv) include at least 100 recorded phenotypic variables per subject; (v) accessible through a website or collaboration with investigators and (vi) make access information available in English. Using these criteria, we identified 30 datasets, reviewed them and provided results in the release version of a catalog, which is publicly available through a dynamic Web application and on GitHub. Users can review as well as contribute new datasets for inclusion (Web: https://avillachlab.shinyapps.io/genophenocatalog/; GitHub: https://github.com/hms-dbmi/GenoPheno-CatalogShiny).


2018 ◽  
Author(s):  
Diogo Pratas ◽  
Armando J. Pinho ◽  
Raquel M. Silva ◽  
João M. O. S. Rodrigues ◽  
Morteza Hosseini ◽  
...  

The general approaches to detect and quantify metagenomic sample composition are based on the alignment of the reads, according to an existing database containing reference microbial sequences. However, without proper parameterization, these methods are not suitable for ancient DNA. Quantifying somewhat dissimilar sequences by alignment methods is problematic, due to the need of fine-tuned thresholds, considering relaxed edit distances and the consequent increase of computational cost. Additionally, the choice of the thresholds poses the problem of how to quantify similarity without producing overestimated measures. We propose FALCON-meta, a compression-based method to infer metagenomic composition of next-generation sequencing samples. This unsupervised alignment-free method runs efficiently on FASTQ samples. FALCON-meta quickly learns how to give importance to the models that cooperate to predict similarity, incorporating parallelism and flexibility for multiple hardware characteristics. It shows substantial identification capabilities in ancient DNA without overestimation. In one of the examples, we found and authenticated an ancient Pseudomonas bacteria in a Mammoth mitogenome.FALCON-meta can be accessed at https://github.com/pratas/falcon.


Sign in / Sign up

Export Citation Format

Share Document