scholarly journals Enabling Precision Medicine via standard communication of HTS provenance, analysis, and results

2017 ◽  
Author(s):  
Gil Alterovitz ◽  
Dennis Dean ◽  
Carole Goble ◽  
Michael R. Crusoe ◽  
Stian Soiland-Reyes ◽  
...  

AbstractA personalized approach based on a patient’s or pathogen’s unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to FAIR guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet lab procedures to computational methods. The BioCompute framework (https://osf.io/zm97b/) enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This framework facilitates communication and promotes interoperability. Bioinformatics computation instances that employ the BioCompute framework are easily relayed, repeated if needed and compared by scientists, regulators, test developers, and clinicians. Easing the burden of performing the aforementioned tasks greatly extends the range of practical application. Large clinical trials, precision medicine, and regulatory submissions require a set of agreed upon standards that ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCO) offer that standard, and are freely accessible as a GitHub organization (https://github.com/biocompute-objects) following the “Open-Stand.org principles for collaborative open standards development”. By communication of high-throughput sequencing studies using a BCO, regulatory agencies (e.g., FDA), diagnostic test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, potentially decreasing the time and cost associated with next generation sequencing workflow exchange, reporting, and regulatory reviews.


2014 ◽  
Vol 64 (Pt_2) ◽  
pp. 316-324 ◽  
Author(s):  
Jongsik Chun ◽  
Fred A. Rainey

The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA–DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12 000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11 000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.



2019 ◽  
Author(s):  
◽  
Sarah Unruh

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Phylogenetic trees show us how organisms are related and provide frameworks for studying and testing evolutionary hypotheses. To better understand the evolution of orchids and their mycorrhizal fungi, I used high-throughput sequencing data and bioinformatic analyses, to build phylogenetic hypotheses. In Chapter 2, I used transcriptome sequences to both build a phylogeny of the slipper orchid genera and to confirm the placement of a polyploidy event at the base of the orchid family. Polyploidy is hypothesized to be a strong driver of evolution and a source of unique traits so confirming this event leads us closer to explaining extant orchid diversity. The list of orthologous genes generated from this study will provide a less expensive and more powerful method for researchers examining the evolutionary relationships in Orchidaceae. In Chapter 3, I generated genomic sequence data for 32 fungal isolates that were collected from orchids across North America. I inferred the first multi-locus nuclear phylogenetic tree for these fungal clades. The phylogenetic structure of these fungi will improve the taxonomy of these clades by providing evidence for new species and for revising problematic species designations. A robust taxonomy is necessary for studying the role of fungi in the orchid mycorrhizal symbiosis. In chapter 4 I summarize my work and outline the future directions of my lab at Illinois College including addressing the remaining aims of my Community Sequencing Proposal with the Joint Genome Institute by analyzing the 15 fungal reference genomes I generated during my PhD. Together these chapters are the start of a life-long research project into the evolution and function of the orchid/fungal symbiosis.



2013 ◽  
Vol 368 (1626) ◽  
pp. 20120504 ◽  
Author(s):  
Gkikas Magiorkinis ◽  
Robert Belshaw ◽  
Aris Katzourakis

Almost 8% of the human genome comprises endogenous retroviruses (ERVs). While they have been shown to cause specific pathologies in animals, such as cancer, their association with disease in humans remains controversial. The limited evidence is partly due to the physical and bioethical restrictions surrounding the study of transposons in humans, coupled with the major experimental and bioinformatics challenges surrounding the association of ERVs with disease in general. Two biotechnological landmarks of the past decade provide us with unprecedented research artillery: (i) the ultra-fine sequencing of the human genome and (ii) the emergence of high-throughput sequencing technologies. Here, we critically assemble research about potential pathologies of ERVs in humans. We argue that the time is right to revisit the long-standing questions of human ERV pathogenesis within a robust and carefully structured framework that makes full use of genomic sequence data. We also pose two thought-provoking research questions on potential pathophysiological roles of ERVs with respect to immune escape and regulation.



2020 ◽  
Author(s):  
Tamra Lysaght ◽  
Angela Ballantyne ◽  
Vicki Xafis ◽  
Serene Ong ◽  
Gerald Owen Schaefer ◽  
...  

Abstract Background We aimed to examine the ethical concerns Singaporeans have about sharing health-data for precision medicine (PM) and identify suggestions for governance strategies. Just as Asian genomes are under-represented in PM, the views of Asian populations about the risks and benefits of data sharing are under-represented in prior attitudinal research.Methods We conducted seven focus groups with 62 participants in Singapore from May to July 2019. They were conducted in three languages (English, Mandarin and Malay) and analysed with qualitative content and thematic analysis. Results Four key themes emerged: nuanced understandings of data security and data sensitivity; trade-offs between data protection and research benefits; trust (and distrust) in the public and private sectors; and governance and control options. Participants were aware of the inherent risks associated with data sharing for research. Participants expressed conditional support for data sharing, including genomic sequence data and information contained within electronic medical records. This support included sharing data with researchers from universities and healthcare institutions, both in Singapore and overseas. Support was conditional on the perceived social value of the research and appropriate de-identification and data security processes. Participants suggested that a data sharing oversight body would help strengthen public trust and comfort in data research for PM in Singapore.Conclusion Maintenance of public trust in data security systems and governance regimes can enhance participation in PM and data sharing for research. Contrary to themes in much prior research, participants demonstrated a sophisticated understanding of the inherent risks of data sharing, analysed trade-offs between risks and potential benefits of PM, and often adopted an international perspective.



2020 ◽  
Author(s):  
Tamra Lysaght ◽  
Angela Ballantyne ◽  
Vicki Xafis ◽  
Serene Ong ◽  
Gerald Owen Schaefer ◽  
...  

Abstract Background We aimed to examine the ethical concerns Singaporeans have about sharing health-data for precision medicine (PM) and identify suggestions for governance strategies. Just as Asian genomes are under-represented in PM, the views of Asian populations about the risks and benefits of data sharing are under-represented in prior attitudinal research.Methods We conducted seven focus groups with 62 participants in Singapore from May to July 2019. They were conducted in three languages (English, Mandarin and Malay) and analysed with qualitative content and thematic analysis. Results Four key themes emerged: nuanced understandings of data security and data sensitivity; trades-offs between data protection and research benefits; trust (and distrust) in the public and private sectors; and governance and control options. Participants were aware of the inherent risks associated with data sharing for research. Participants expressed conditional support for data sharing, including genomic sequence data and information contained within electronic medical records. This support included sharing data with researchers from universities and healthcare institutions, both in Singapore and overseas. Support was conditional on the perceived social value of the research and appropriate de-identification and data security processes. Participants suggested that a data sharing oversight body would help strengthen public trust and comfort in data research for PM in Singapore.Conclusion Maintenance of public trust in data security systems and governance regimes can enhance participation in PM and data sharing for research. Contrary to themes in much prior research, participants demonstrated a sophisticated understanding of the inherent risks of data sharing, analysed trade-offs between risks and potential benefits of PM, and often adopted an international perspective.



1999 ◽  
Vol 9 (1) ◽  
pp. 53-61 ◽  
Author(s):  
Wonhee Jang ◽  
Axin Hua ◽  
Sandra V. Spilson ◽  
Webb Miller ◽  
Bruce A. Roe ◽  
...  

The mnd2 mutation on mouse chromosome 6 produces a progressive neuromuscular disorder. To determine the gene content of the 400-kb mnd2 nonrecombinant region, we sequenced 108 kb of mouse genomic DNA and 92 kb of human genomic sequence from the corresponding region of chromosome 2p13.3. Three genes with the indicated sizes and intergenic distances were identified:D6Mm5e (⩾81 kb)–787 bp–DOK (2 kb)–845 bp–LOR2 (⩾6 kb). D6Mm5e is expressed in many tissues at very low abundance and the predicted 526-residue protein contains no known functional domains. DOK encodes the p62dok rasGAP binding protein involved in signal transduction. LOR2 encodes a novel lysyl oxidase-related protein of 757 amino acid residues. We describe a simple search protocol for identification of conserved internal exons in genomic sequence. Evolutionary conservation proved to be a useful criterion for distinguishing between authentic exons and artifactual products obtained by exon amplification, RT–PCR, and 5′ RACE. Conserved noncoding sequence elements longer than 80 bp with ⩾75% nucleotide sequence identity comprise ∼1% of the genomic sequence in this region. Comparative analysis of this human and mouse genomic DNA sequence was an efficient method for gene identification and is independent of developmental stage or quantitative level of gene expression.[The sequence data described in this paper have been submitted to the GenBank data library under the following accession numbers: AC003061, mouse BAC clone 245c12; AC003065, human BAC clone h173(E10); AF053368, mouse Lor2 cDNA; AF084363, 108-kb contig from mouse BAC 245c12; AF084364, mouse D6Mm5ecDNA.]



2019 ◽  
Vol 5 (Supplement_1) ◽  
Author(s):  
P Anh ◽  
P Tam ◽  
N Tue ◽  
M Rabaa ◽  
G Thwaites ◽  
...  

Abstract An increasing number of zoonotic viruses have been detected in animals, especially in poultry species. Understanding the diversity of zoonotic infections and the local behavior helps to characterize the pathogen diversity in human and animals and predict the risk of pathogen spill-over from animals to human. Vietnam is considered, along with other countries in Southeast Asia, as a hotspot for zoonotic viruses. In Vietnam, domestic animals are typically farmed in close proximity to humans, which may increase the risk of transmission of zoonotic pathogens. Our previous studies found the presence of some zoonotic viruses (e.g. rotavirus group A, hepatitis E virus) in domestic pigs. However, the risk of pathogenic transmission from domestic animals to humans has not been determined. Detailed genomic sequence data may help to track the origin and evolution of zoonotic pathogens. To understand the origins and emergence of zoonotic infections in people, who have regular contact with animals, we will investigate the viral diversity in farmers and domestic animals in their farm, using high-throughput sequencing technique. Viral RNA was extracted from pooled fecal samples of 30 farmers and 50 pigs, and used as input for SureSelect target enrichment and Illumina MiSeq sequencing.



2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Abraham Gihawi ◽  
Ghanasyam Rallapalli ◽  
Rachel Hurst ◽  
Colin S. Cooper ◽  
Richard M. Leggett ◽  
...  

Abstract Background Human tissue is increasingly being whole genome sequenced as we transition into an era of genomic medicine. With this arises the potential to detect sequences originating from microorganisms, including pathogens amid the plethora of human sequencing reads. In cancer research, the tumorigenic ability of pathogens is being recognized, for example, Helicobacter pylori and human papillomavirus in the cases of gastric non-cardia and cervical carcinomas, respectively. As of yet, no benchmark has been carried out on the performance of computational approaches for bacterial and viral detection within host-dominated sequence data. Results We present the results of benchmarking over 70 distinct combinations of tools and parameters on 100 simulated cancer datasets spiked with realistic proportions of bacteria. mOTUs2 and Kraken are the highest performing individual tools achieving median genus-level F1 scores of 0.90 and 0.91, respectively. mOTUs2 demonstrates a high performance in estimating bacterial proportions. Employing Kraken on unassembled sequencing reads produces a good but variable performance depending on post-classification filtering parameters. These approaches are investigated on a selection of cervical and gastric cancer whole genome sequences where Alphapapillomavirus and Helicobacter are detected in addition to a variety of other interesting genera. Conclusions We provide the top-performing pipelines from this benchmark in a unifying tool called SEPATH, which is amenable to high throughput sequencing studies across a range of high-performance computing clusters. SEPATH provides a benchmarked and convenient approach to detect pathogens in tissue sequence data helping to determine the relationship between metagenomics and disease.



2018 ◽  
Vol 50 (4) ◽  
pp. 237-243 ◽  
Author(s):  
Anna Marie Williams ◽  
Yong Liu ◽  
Kevin R. Regner ◽  
Fabrice Jotterand ◽  
Pengyuan Liu ◽  
...  

Big data are a major driver in the development of precision medicine. Efficient analysis methods are needed to transform big data into clinically-actionable knowledge. To accomplish this, many researchers are turning toward machine learning (ML), an approach of artificial intelligence (AI) that utilizes modern algorithms to give computers the ability to learn. Much of the effort to advance ML for precision medicine has been focused on the development and implementation of algorithms and the generation of ever larger quantities of genomic sequence data and electronic health records. However, relevance and accuracy of the data are as important as quantity of data in the advancement of ML for precision medicine. For common diseases, physiological genomic readouts in disease-applicable tissues may be an effective surrogate to measure the effect of genetic and environmental factors and their interactions that underlie disease development and progression. Disease-applicable tissue may be difficult to obtain, but there are important exceptions such as kidney needle biopsy specimens. As AI continues to advance, new analytical approaches, including those that go beyond data correlation, need to be developed and ethical issues of AI need to be addressed. Physiological genomic readouts in disease-relevant tissues, combined with advanced AI, can be a powerful approach for precision medicine for common diseases.



2020 ◽  
Author(s):  
Renesh Bedre ◽  
Carlos Avila ◽  
Kranthi Mandadi

AbstractMotivationUse of high-throughput sequencing (HTS) has become indispensable in life science research. Raw HTS data contains several sequencing artifacts, and as a first step it is imperative to remove the artifacts for reliable downstream bioinformatics analysis. Although there are multiple stand-alone tools available that can perform the various quality control steps separately, availability of an integrated tool that can allow one-step, automated quality control analysis of HTS datasets will significantly enhance handling large number of samples parallelly.ResultsHere, we developed HTSeqQC, a stand-alone, flexible, and easy-to-use software for one-step quality control analysis of raw HTS data. HTSeqQC can evaluate HTS data quality and perform filtering and trimming analysis in a single run. We evaluated the performance of HTSeqQC for conducting batch analysis of HTS datasets with 322 sample datasets with an average ∼ 1M (paired end) sequence reads per sample. HTSeqQC accomplished the QC analysis in ∼3 hours in distributed mode and ∼31 hours in shared mode, thus underscoring its utility and robust performance.Availability and implementationHTSeqQC software, Docker image and Nextflow template are available for download at https://github.com/reneshbedre/HTSeqQC and graphical user interface (GUI) is available at CyVerse Discovery Environment (DE) (https://cyverse.org/). Documentation available at https://reneshbedre.github.io/blog/htseqqc.html and https://cyverse-htseqqc-cyverse-tutorial.readthedocs-hosted.com/en/latest/ (for CyVerse).ContactKranthi Mandadi ([email protected])Supplementary informationSupplementary information provided in Supplementary File 1.



Sign in / Sign up

Export Citation Format

Share Document