scholarly journals HiC-Hiker: a probabilistic model to determine contig orientation in chromosome-length scaffolds with Hi-C

2020 ◽  
Vol 36 (13) ◽  
pp. 3966-3974
Author(s):  
Ryo Nakabayashi ◽  
Shinichi Morishita

Abstract Motivation De novo assembly of reference-quality genomes used to require enormously laborious tasks. In particular, it is extremely time-consuming to build genome markers for ordering assembled contigs along chromosomes; thus, they are only available for well-established model organisms. To resolve this issue, recent studies demonstrated that Hi-C could be a powerful and cost-effective means to output chromosome-length scaffolds for non-model species with no genome marker resources, because the Hi-C contact frequency between a pair of two loci can be a good estimator of their genomic distance, even if there is a large gap between them. Indeed, state-of-the-art methods such as 3D-DNA are now widely used for locating contigs in chromosomes. However, it remains challenging to reduce errors in contig orientation because shorter contigs have fewer contacts with their neighboring contigs. These orientation errors lower the accuracy of gene prediction, read alignment, and synteny block estimation in comparative genomics. Results To reduce these contig orientation errors, we propose a new algorithm, named HiC-Hiker, which has a firm grounding in probabilistic theory, rigorously models Hi-C contacts across contigs, and effectively infers the most probable orientations via the Viterbi algorithm. We compared HiC-Hiker and 3D-DNA using human and worm genome contigs generated from short reads, evaluated their performances, and observed a remarkable reduction in the contig orientation error rate from 4.3% (3D-DNA) to 1.7% (HiC-Hiker). Our algorithm can consider long-range information between distal contigs and precisely estimates Hi-C read contact probabilities among contigs, which may also be useful for determining the ordering of contigs. Availability and implementation HiC-Hiker is freely available at: https://github.com/ryought/hic_hiker.

PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3702 ◽  
Author(s):  
Santiago Montero-Mendieta ◽  
Manfred Grabherr ◽  
Henrik Lantz ◽  
Ignacio De la Riva ◽  
Jennifer A. Leonard ◽  
...  

Whole genome sequencing (WGS) is a very valuable resource to understand the evolutionary history of poorly known species. However, in organisms with large genomes, as most amphibians, WGS is still excessively challenging and transcriptome sequencing (RNA-seq) represents a cost-effective tool to explore genome-wide variability. Non-model organisms do not usually have a reference genome and the transcriptome must be assembledde-novo. We used RNA-seq to obtain the transcriptomic profile forOreobates cruralis, a poorly known South American direct-developing frog. In total, 550,871 transcripts were assembled, corresponding to 422,999 putative genes. Of those, we identified 23,500, 37,349, 38,120 and 45,885 genes present in the Pfam, EggNOG, KEGG and GO databases, respectively. Interestingly, our results suggested that genes related to immune system and defense mechanisms are abundant in the transcriptome ofO. cruralis. We also present a pipeline to assist with pre-processing, assembling, evaluating and functionally annotating ade-novotranscriptome from RNA-seq data of non-model organisms. Our pipeline guides the inexperienced user in an intuitive way through all the necessary steps to buildde-novotranscriptome assemblies using readily available software and is freely available at:https://github.com/biomendi/TRANSCRIPTOME-ASSEMBLY-PIPELINE/wiki.


GigaScience ◽  
2020 ◽  
Vol 9 (5) ◽  
Author(s):  
Graham J Etherington ◽  
Darren Heavens ◽  
David Baker ◽  
Ashleigh Lister ◽  
Rose McNelly ◽  
...  

Abstract Background Whilst much sequencing effort has focused on key mammalian model organisms such as mouse and human, little is known about the relationship between genome sequencing techniques for non-model mammals and genome assembly quality. This is especially relevant to non-model mammals, where the samples to be sequenced are often degraded and of low quality. A key aspect when planning a genome project is the choice of sequencing data to generate. This decision is driven by several factors, including the biological questions being asked, the quality of DNA available, and the availability of funds. Cutting-edge sequencing technologies now make it possible to achieve highly contiguous, chromosome-level genome assemblies, but rely on high-quality high molecular weight DNA. However, funding is often insufficient for many independent research groups to use these techniques. Here we use a range of different genomic technologies generated from a roadkill European polecat (Mustela putorius) to assess various assembly techniques on this low-quality sample. We evaluated different approaches for de novo assemblies and discuss their value in relation to biological analyses. Results Generally, assemblies containing more data types achieved better scores in our ranking system. However, when accounting for misassemblies, this was not always the case for Bionano and low-coverage 10x Genomics (for scaffolding only). We also find that the extra cost associated with combining multiple data types is not necessarily associated with better genome assemblies. Conclusions The high degree of variability between each de novo assembly method (assessed from the 7 key metrics) highlights the importance of carefully devising the sequencing strategy to be able to carry out the desired analysis. Adding more data to genome assemblies does not always result in better assemblies, so it is important to understand the nuances of genomic data integration explained here, in order to obtain cost-effective value for money when sequencing genomes.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8757 ◽  
Author(s):  
Artem Nedoluzhko ◽  
Fedor Sharko ◽  
Md. Golam Rbbani ◽  
Anton Teslyuk ◽  
Ioannis Konstantinidis ◽  
...  

Circular RNAs (circRNAs) are long noncoding RNAs that play a significant role in various biological processes, including embryonic development and stress responses. These regulatory molecules can modulate microRNA activity and are involved in different molecular pathways as indirect regulators of gene expression. Thousands of circRNAs have been described in diverse taxa due to the recent advances in high throughput sequencing technologies, which led to a huge variety of total RNA sequencing being publicly available. A number of circRNA de novo and host gene prediction tools are available to date, but their ability to accurately predict circRNA host genes is limited in the case of low-quality genome assemblies or annotations. Here, we present CircParser, a simple and fast Unix/Linux pipeline that uses the outputs from the most common circular RNAs in silico prediction tools (CIRI, CIRI2, CircExplorer2, find_circ, and circFinder) to annotate circular RNAs, assigning presumptive host genes from local or public databases such as National Center for Biotechnology Information (NCBI). Also, this pipeline can discriminate circular RNAs based on their structural components (exonic, intronic, exon-intronic or intergenic) using a genome annotation file.


2018 ◽  
Author(s):  
Karine A. Martinez-Viaud ◽  
Cindy Taylor Lawley ◽  
Milmer Martinez Vergara ◽  
Gil Ben-Zvi ◽  
Tammy Biniashvili ◽  
...  

AbstractHigh quality genomes are essential to resolve challenges in breeding, comparative biology, medicine and conservation planning. New library preparation techniques along with better assembly algorithms result in continued improvements in assemblies for non-model organisms, moving them toward reference quality genomes. We report on the latest genome assembly of the Atlantic bottlenose dolphin leveraging Illumina sequencing data coupled with a combination of several library preparation techniques. These include Linked-Reads (Chromium, 10x Genomics), mate pairs, long insert paired ends and standard paired ends. Data were assembled with the commercial DeNovoMAGICTM assembly software resulting in two assemblies, a traditional “haploid” assembly (Tur_tru_Illumina_hap_v1) that is a mosaic of the two parental haplotypes and a phased assembly (Tur_tru_Illumina_phased_v1) where each scaffold has sequence from a single homologous chromosome. We show that Tur_tru_Illumina_hap_v1 is more complete and accurate compared to the current best reference based on the amount and composition of sequence, the consistency of the mate pair alignments to the assembled scaffolds, and on the analysis of conserved single-copy mammalian orthologs. The phased de novo assembly Tur_tru_Illumina_phased_v1 is the first publicly available for this species and provides the community with novel and accurate ways to explore the heterozygous nature of the dolphin genome.


2017 ◽  
Author(s):  
Santiago Montero-Mendieta ◽  
Manfred Grabherr ◽  
Henrik Lantz ◽  
Ignacio De la Riva ◽  
Jennifer A Leonard ◽  
...  

Whole genome sequencing is opening the door to novel insights into the population structure and evolutionary history of poorly known species. In organisms with large genomes, which includes most amphibians, whole-genome sequencing is excessively challenging and transcriptome sequencing (RNA-seq) represents a cost-effective tool to explore genome-wide variability. Non-model organisms do not usually have a reference genome to facilitate assembly and the transcriptome sequence must be assembled de-novo. We used RNA-seq to obtain the transcriptome profile for Oreobates cruralis, a poorly known South American direct-developing frog. In total, 550,871 transcripts were assembled, corresponding to 422,999 putative genes. Of those, we identified 23,500, 37,349, 38,120 and 45,885 genes present in the Pfam, EggNOG, KEGG and GO databases, respectively. Interestingly, our results suggested that genes related to immune system and defense mechanisms are abundant in the transcriptome of O. cruralis. We also present a workflow to assist with pre-processing, assembling, evaluating and functionally annotating a de-novo transcriptome from RNA-seq data of non-model organisms. Our workflow guides the inexperienced user in an intuitive way through all the necessary steps to build de-novo transcriptome assemblies using readily available software and is freely available at: https://github.com/biomendi/PRACTICAL-GUIDE-TO-BUILD-DE-NOVO-TRANSCRIPTOME-ASSEMBLIES-FOR-NON-MODEL-ORGANISMS/wiki


2015 ◽  
Author(s):  
Hiroaki Sakai ◽  
Ken Naito ◽  
Eri Ogiso-Tanaka ◽  
Yu Takahashi ◽  
Kohtaro Iseki ◽  
...  

Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms. However, a large portion of the genomes still remains unassembled. We reconstructed azuki bean (Vigna angularis) genome using single molecule real-time (SMRT) sequencing technology and achieved the best contiguity and coverage among currently assembled legume crops. The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies. A detailed comparison between the assemblies revealed that the SMRT-based assembly enabled a more comprehensive gene annotation than the SGS-based assemblies where thousands of genes were missing or fragmented. A chromosome-scale assembly was generated based on the high-density genetic map, covering 86% of the azuki bean genome. We demonstrated that SMRT technology, though still needed to be assisted by SGS data, can achieve a near-complete assembly of a eukaryotic genome.


2017 ◽  
Author(s):  
Santiago Montero-Mendieta ◽  
Manfred Grabherr ◽  
Henrik Lantz ◽  
Ignacio De la Riva ◽  
Jennifer A Leonard ◽  
...  

Whole genome sequencing is opening the door to novel insights into the population structure and evolutionary history of poorly known species. In organisms with large genomes, which includes most amphibians, whole-genome sequencing is excessively challenging and transcriptome sequencing (RNA-seq) represents a cost-effective tool to explore genome-wide variability. Non-model organisms do not usually have a reference genome to facilitate assembly and the transcriptome sequence must be assembled de-novo. We used RNA-seq to obtain the transcriptome profile for Oreobates cruralis, a poorly known South American direct-developing frog. In total, 550,871 transcripts were assembled, corresponding to 422,999 putative genes. Of those, we identified 23,500, 37,349, 38,120 and 45,885 genes present in the Pfam, EggNOG, KEGG and GO databases, respectively. Interestingly, our results suggested that genes related to immune system and defense mechanisms are abundant in the transcriptome of O. cruralis. We also present a workflow to assist with pre-processing, assembling, evaluating and functionally annotating a de-novo transcriptome from RNA-seq data of non-model organisms. Our workflow guides the inexperienced user in an intuitive way through all the necessary steps to build de-novo transcriptome assemblies using readily available software and is freely available at: https://github.com/biomendi/PRACTICAL-GUIDE-TO-BUILD-DE-NOVO-TRANSCRIPTOME-ASSEMBLIES-FOR-NON-MODEL-ORGANISMS/wiki


2011 ◽  
Vol 39 (3) ◽  
pp. 193-209 ◽  
Author(s):  
H. Surendranath ◽  
M. Dunbar

Abstract Over the last few decades, finite element analysis has become an integral part of the overall tire design process. Engineers need to perform a number of different simulations to evaluate new designs and study the effect of proposed design changes. However, tires pose formidable simulation challenges due to the presence of highly nonlinear rubber compounds, embedded reinforcements, complex tread geometries, rolling contact, and large deformations. Accurate simulation requires careful consideration of these factors, resulting in the extensive turnaround time, often times prolonging the design cycle. Therefore, it is extremely critical to explore means to reduce the turnaround time while producing reliable results. Compute clusters have recently become a cost effective means to perform high performance computing (HPC). Distributed memory parallel solvers designed to take advantage of compute clusters have become increasingly popular. In this paper, we examine the use of HPC for various tire simulations and demonstrate how it can significantly reduce simulation turnaround time. Abaqus/Standard is used for routine tire simulations like footprint and steady state rolling. Abaqus/Explicit is used for transient rolling and hydroplaning simulations. The run times and scaling data corresponding to models of various sizes and complexity are presented.


Author(s):  
Tochukwu Moses ◽  
David Heesom ◽  
David Oloke ◽  
Martin Crouch

The UK Construction Industry through its Government Construction Strategy has recently been mandated to implement Level 2 Building Information Modelling (BIM) on public sector projects. This move, along with other initiatives is key to driving a requirement for 25% cost reduction (establishing the most cost-effective means) on. Other key deliverables within the strategy include reduction in overall project time, early contractor involvement, improved sustainability and enhanced product quality. Collaboration and integrated project delivery is central to the level 2 implementation strategy yet the key protocols or standards relative to cost within BIM processes is not well defined. As offsite construction becomes more prolific within the UK construction sector, this construction approach coupled with BIM, particularly 5D automated quantification process, and early contractor involvement provides significant opportunities for the sector to meet government targets. Early contractor involvement is supported by both the industry and the successive Governments as a credible means to avoid and manage project risks, encourage innovation and value add, making cost and project time predictable, and improving outcomes. The contractor is seen as an expert in construction and could be counter intuitive to exclude such valuable expertise from the pre-construction phase especially with the BIM intent of äóÖbuild it twiceäó», once virtually and once physically. In particular when offsite construction is used, the contractoräó»s construction expertise should be leveraged for the virtual build in BIM-designed projects to ensure a fully streamlined process. Building in a layer of automated costing through 5D BIM will bring about a more robust method of quantification and can help to deliver the 25% reduction in overall cost of a project. Using a literature review and a case study, this paper will look into the benefits of Early Contractor Involvement (ECI) and the impact of 5D BIM on the offsite construction process.


2019 ◽  
Vol 26 (28) ◽  
pp. 5340-5362 ◽  
Author(s):  
Xin Chen ◽  
Giuseppe Gumina ◽  
Kristopher G. Virga

:As a long-term degenerative disorder of the central nervous system that mostly affects older people, Parkinson’s disease is a growing health threat to our ever-aging population. Despite remarkable advances in our understanding of this disease, all therapeutics currently available only act to improve symptoms but cannot stop the disease progression. Therefore, it is essential that more effective drug discovery methods and approaches are developed, validated, and used for the discovery of disease-modifying treatments for Parkinson’s disease. Drug repurposing, also known as drug repositioning, or the process of finding new uses for existing or abandoned pharmaceuticals, has been recognized as a cost-effective and timeefficient way to develop new drugs, being equally promising as de novo drug discovery in the field of neurodegeneration and, more specifically for Parkinson’s disease. The availability of several established libraries of clinical drugs and fast evolvement in disease biology, genomics and bioinformatics has stimulated the momentums of both in silico and activity-based drug repurposing. With the successful clinical introduction of several repurposed drugs for Parkinson’s disease, drug repurposing has now become a robust alternative approach to the discovery and development of novel drugs for this disease. In this review, recent advances in drug repurposing for Parkinson’s disease will be discussed.


Sign in / Sign up

Export Citation Format

Share Document