scholarly journals AprGPD: the apricot genomic and phenotypic database

Plant Methods ◽  
2021 ◽  
Vol 17 (1) ◽  
Author(s):  
Chen Chen ◽  
Huimin Liu ◽  
Ningning Gou ◽  
Mengzhen Huang ◽  
Wanyu Xu ◽  
...  

Abstract Background Apricot is cultivated worldwide because of its high nutritive content and strong adaptability. Its flesh is delicious and has a unique and pleasant aroma. Apricot kernel is also consumed as nuts. The genome of apricot has been sequenced, and the transcriptome, resequencing, and phenotype data have been increasely generated. However, with the emergence of new information, the data are expected to integrate, and disseminate. Results To better manage the continuous addition of new data and increase convenience, we constructed the apricot genomic and phenotypic database (AprGPD, http://apricotgpd.com). At present, AprGPD contains three reference genomes, 1692 germplasms, 306 genome resequencing data, 90 RNA sequencing data. A set of user-friendly query, analysis, and visualization tools have been implemented in AprGPD. We have also performed a detailed analysis of 59 transcription factor families for the three genomes of apricot. Conclusion Six modules are displayed in AprGPD, including species, germplasm, genome, variation, product, tools. The data integrated by AprGPD will be helpful for the molecular breeding of apricot.

2016 ◽  
Author(s):  
Matteo Chiara ◽  
Giovanni Chillemi ◽  
Mattia D'Antonio ◽  
Paolo D'Onorio De Meo ◽  
Tiziano Flati ◽  
...  

The reduction in sequencing costs associated with next generation sequencing technologies (NGS) has led to a rapid upsurge in the amount of genome re-sequencing data, paving the way for the advent of personalized genomics and precision medicine. Accurate genotyping is crucial for effective analyses of these data, and in particular for the correct identification of candidate causal mutations in diagnostic screenings. The body of genome resequencing data will likely see exponential growth in the next few years, underlining the need for publicly available, accurate and time-effective bioinformatics systems for data analysis. Ideally, such systems should be easy to use and constantly updated as new genomes and software tools are released. Here we present IVaCS, a fully automated, highly accurate system with a web based graphical interface for genotyping and variant annotation. IVaCS offers state of the art tools for variant calling and annotation along with expert made pipelines for the analysis of whole genome sequencing (WGS), whole-exome sequencing (WES) and targeted resequencing (TGS) data, performing all steps from quality trimming to variant annotation. The system is specifically designed to assist users with little or no bioinformatics skills and all the pipelines are available through a user friendly web interface. The final output is provided in the form of a dynamic web page where variants can be selected on the base of user defined hard filters. A comprehensive report containing detailed information and statistics concerning the execution of each step of the pipelines is also generated. Extensive tests on publicly available genome resequencing data (Illumina platinum genome NA12878), show that our system recovers a slightly better sensitivity and a higher specificity than the commercial Illumina VCAT 2.0 software. IVaCS is implemented with a modular architecture and each module (quality trimming, reads mapping, variant calling, variant annotation) can be used independently. IVaCS may manage all the major commercial kits for exome sequencing, such as Illumina, Agilent or Nimblegen, along with a comprehensive collection of reference genomes (all the Illumina genomes, including human, mouse and cow, among the others) with corresponding genomic annotations. Finally, the software leverages an ensemble of publicly available resources (e.g., dbSNP, OMIM, COSMIC and ClinVar among others) for the functional annotation of human variants. Advanced users needing more control over the single steps might also request the command-line version of the software which is more flexible and easy to customize. IVaCS has a very active and growing community. The system is under constant development and new reference genomes, databases and bioinformatics tools are added to IvaCS on a regular basis. IVACS is available at: https://bioinformatics.cineca.it/ivacs


2016 ◽  
Author(s):  
Matteo Chiara ◽  
Giovanni Chillemi ◽  
Mattia D'Antonio ◽  
Paolo D'Onorio De Meo ◽  
Tiziano Flati ◽  
...  

The reduction in sequencing costs associated with next generation sequencing technologies (NGS) has led to a rapid upsurge in the amount of genome re-sequencing data, paving the way for the advent of personalized genomics and precision medicine. Accurate genotyping is crucial for effective analyses of these data, and in particular for the correct identification of candidate causal mutations in diagnostic screenings. The body of genome resequencing data will likely see exponential growth in the next few years, underlining the need for publicly available, accurate and time-effective bioinformatics systems for data analysis. Ideally, such systems should be easy to use and constantly updated as new genomes and software tools are released. Here we present IVaCS, a fully automated, highly accurate system with a web based graphical interface for genotyping and variant annotation. IVaCS offers state of the art tools for variant calling and annotation along with expert made pipelines for the analysis of whole genome sequencing (WGS), whole-exome sequencing (WES) and targeted resequencing (TGS) data, performing all steps from quality trimming to variant annotation. The system is specifically designed to assist users with little or no bioinformatics skills and all the pipelines are available through a user friendly web interface. The final output is provided in the form of a dynamic web page where variants can be selected on the base of user defined hard filters. A comprehensive report containing detailed information and statistics concerning the execution of each step of the pipelines is also generated. Extensive tests on publicly available genome resequencing data (Illumina platinum genome NA12878), show that our system recovers a slightly better sensitivity and a higher specificity than the commercial Illumina VCAT 2.0 software. IVaCS is implemented with a modular architecture and each module (quality trimming, reads mapping, variant calling, variant annotation) can be used independently. IVaCS may manage all the major commercial kits for exome sequencing, such as Illumina, Agilent or Nimblegen, along with a comprehensive collection of reference genomes (all the Illumina genomes, including human, mouse and cow, among the others) with corresponding genomic annotations. Finally, the software leverages an ensemble of publicly available resources (e.g., dbSNP, OMIM, COSMIC and ClinVar among others) for the functional annotation of human variants. Advanced users needing more control over the single steps might also request the command-line version of the software which is more flexible and easy to customize. IVaCS has a very active and growing community. The system is under constant development and new reference genomes, databases and bioinformatics tools are added to IvaCS on a regular basis. IVACS is available at: https://bioinformatics.cineca.it/ivacs


Viruses ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1338
Author(s):  
Morgan E. Meissner ◽  
Emily J. Julik ◽  
Jonathan P. Badalamenti ◽  
William G. Arndt ◽  
Lauren J. Mills ◽  
...  

Human immunodeficiency virus type 2 (HIV-2) accumulates fewer mutations during replication than HIV type 1 (HIV-1). Advanced studies of HIV-2 mutagenesis, however, have historically been confounded by high background error rates in traditional next-generation sequencing techniques. In this study, we describe the adaptation of the previously described maximum-depth sequencing (MDS) technique to studies of both HIV-1 and HIV-2 for the ultra-accurate characterization of viral mutagenesis. We also present the development of a user-friendly Galaxy workflow for the bioinformatic analyses of sequencing data generated using the MDS technique, designed to improve replicability and accessibility to molecular virologists. This adapted MDS technique and analysis pipeline were validated by comparisons with previously published analyses of the frequency and spectra of mutations in HIV-1 and HIV-2 and is readily expandable to studies of viral mutation across the genomes of both viruses. Using this novel sequencing pipeline, we observed that the background error rate was reduced 100-fold over standard Illumina error rates, and 10-fold over traditional unique molecular identifier (UMI)-based sequencing. This technical advancement will allow for the exploration of novel and previously unrecognized sources of viral mutagenesis in both HIV-1 and HIV-2, which will expand our understanding of retroviral diversity and evolution.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Nae-Chyun Chen ◽  
Brad Solomon ◽  
Taher Mun ◽  
Sheila Iyer ◽  
Ben Langmead

AbstractMost sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.


2018 ◽  
Vol 35 (15) ◽  
pp. 2654-2656 ◽  
Author(s):  
Guoli Ji ◽  
Wenbin Ye ◽  
Yaru Su ◽  
Moliang Chen ◽  
Guangzao Huang ◽  
...  

Abstract Summary Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. Availability and implementation AStrap is available for download at https://github.com/BMILAB/AStrap. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 28 (14) ◽  
pp. 2319-2329 ◽  
Author(s):  
Kohei Hamanaka ◽  
Atsushi Takata ◽  
Yuri Uchiyama ◽  
Satoko Miyatake ◽  
Noriko Miyake ◽  
...  

AbstractDisorders of sex development (DSDs) are defined as congenital conditions in which chromosomal, gonadal or anatomical sex is atypical. In many DSD cases, genetic causes remain to be elucidated. Here, we performed a case–control exome sequencing study comparing gene-based burdens of rare damaging variants between 26 DSD cases and 2625 controls. We found exome-wide significant enrichment of rare heterozygous truncating variants in the MYRF gene encoding myelin regulatory factor, a transcription factor essential for oligodendrocyte development. All three variants occurred de novo. We identified an additional 46,XY DSD case of a de novo damaging missense variant in an independent cohort. The clinical symptoms included hypoplasia of Müllerian derivatives and ovaries in 46,XX DSD patients, defective development of Sertoli and Leydig cells in 46,XY DSD patients and congenital diaphragmatic hernia in one 46,XY DSD patient. As all of these cells and tissues are or partly consist of coelomic epithelium (CE)-derived cells (CEDC) and CEDC developed from CE via proliferaiton and migration, MYRF might be related to these processes. Consistent with this hypothesis, single-cell RNA sequencing of foetal gonads revealed high expression of MYRF in CE and CEDC. Reanalysis of public chromatin immunoprecipitation sequencing data for rat Myrf showed that genes regulating proliferation and migration were enriched among putative target genes of Myrf. These results suggested that MYRF is a novel causative gene of 46,XY and 46,XX DSD and MYRF is a transcription factor regulating CD and/or CEDC proliferation and migration, which is essential for development of multiple organs.


2021 ◽  
Author(s):  
Martin Floor ◽  
Kengjie Li ◽  
Miquel Estévez-Gay ◽  
Luis Agulló ◽  
Pau Marc Muñoz ◽  
...  

<p>Here we introduce SBMOpenMM, a python library to build Structure-Based Models (SBMs), that uses the OpenMM framework to create and run SBM simulations. The code is flexible, user-friendly, and profits from high customizability and GPU performance provided by the OpenMM platform. We demonstrate its use in the evaluation of the two-step folding process of FoxP1 transcription factor protein. Our results indicate that the newly developed SBM can be successfully applied to elucidating the underlying mechanisms of biomolecular processes.</p><div><br></div>


2017 ◽  
Author(s):  
Zhemin Zhou ◽  
Nina Luhmann ◽  
Nabil-Fareed Alikhan ◽  
Christopher Quince ◽  
Mark Achtman

AbstractExploring the genetic diversity of microbes within the environment through metagenomic sequencing first requires classifying these reads into taxonomic groups. Current methods compare these sequencing data with existing biased and limited reference databases. Several recent evaluation studies demonstrate that current methods either lack sufficient sensitivity for species-level assignments or suffer from false positives, overestimating the number of species in the metagenome. Both are especially problematic for the identification of low-abundance microbial species, e. g. detecting pathogens in ancient metagenomic samples. We present a new method, SPARSE, which improves taxonomic assignments of metagenomic reads. SPARSE balances existing biased reference databases by grouping reference genomes into similarity-based hierarchical clusters, implemented as an efficient incremental data structure. SPARSE assigns reads to these clusters using a probabilistic model, which specifically penalizes non-specific mappings of reads from unknown sources and hence reduces false-positive assignments. Our evaluation on simulated datasets from two recent evaluation studies demonstrated the improved precision of SPARSE in comparison to other methods for species-level classification. In a third simulation, our method successfully differentiated multiple co-existing Escherichia coli strains from the same sample. In real archaeological datasets, SPARSE identified ancient pathogens with ≤ 0.02% abundance, consistent with published findings that required additional sequencing data. In these datasets, other methods either missed targeted pathogens or reported non-existent ones. SPARSE and all evaluation scripts are available at https://github.com/zheminzhou/SPARSE.


2022 ◽  
Author(s):  
Lars Wienbrandt ◽  
David Ellinghaus

Background: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. Methods: We developed EagleImp, a software with algorithmic and technical improvements and new features for accurate and accelerated phasing and imputation in a single tool. Results: We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with more than 1 million reference genomes. EagleImp is 2 to 10 times faster (depending on the single or multiprocessor configuration selected) than Eagle2/PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp provides same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. It has many new features, including automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files, and various user-configurable algorithm and output options. Conclusions: Due to the technical optimisations, EagleImp can perform fast and accurate reference-based phasing and imputation for future very large reference panels with more than 1 million genomes. EagleImp is freely available for download from https://github.com/ikmb/eagleimp.


Sign in / Sign up

Export Citation Format

Share Document