scholarly journals A novel high-accuracy genome assembly method utilizing a high-throughput workflow

2020 ◽  
Author(s):  
Qingdong Zeng ◽  
Wenjin Cao ◽  
Liping Xing ◽  
Guowei Qin ◽  
Jianhui Wu ◽  
...  

AbstractAcross domains of biological research using genome sequence data, high-quality reference genome sequences are essential for characterizing genetic variation and understanding the genetic basis of phenotypes. However, the construction of genome assemblies for various species is often hampered by complexities of genome organization, especially repetitive and complex sequences, leading to mis-assembly and missing regions. Here, we describe a high-throughput gold standard genome assembly workflow using a large-scale bacterial artificial chromosome (BAC) library with a refined two-step pooling strategy and the Lamp assembler algorithm. This strategy minimizes the laborious processes of physical map construction and clone-by-clone sequencing, enabling inexpensive sequencing of several thousand BAC clones. By applying this strategy with a minimum tiling path BAC clone library for the short arm of chromosome 2D (2DS) of bread wheat, 98% of BAC sequences, covering 92.7% of the 2DS chromosome, were assembled correctly for this species with a highly complex and repetitive genome. We also identified 48 large mis-assemblies in the reference wheat genome assembly (IWGSC RefSeq v1.0) and corrected these large mis-assemblies in addition to filling 92.2% of the gaps in RefSeq v1.0. Our 2DS assembly represents a new benchmark for the assembly of complex genomes with both high accuracy and efficiency.

2019 ◽  
Vol 25 (31) ◽  
pp. 3350-3357 ◽  
Author(s):  
Pooja Tripathi ◽  
Jyotsna Singh ◽  
Jonathan A. Lal ◽  
Vijay Tripathi

Background: With the outbreak of high throughput next-generation sequencing (NGS), the biological research of drug discovery has been directed towards the oncology and infectious disease therapeutic areas, with extensive use in biopharmaceutical development and vaccine production. Method: In this review, an effort was made to address the basic background of NGS technologies, potential applications of NGS in drug designing. Our purpose is also to provide a brief introduction of various Nextgeneration sequencing techniques. Discussions: The high-throughput methods execute Large-scale Unbiased Sequencing (LUS) which comprises of Massively Parallel Sequencing (MPS) or NGS technologies. The Next geneinvolved necessarily executes Largescale Unbiased Sequencing (LUS) which comprises of MPS or NGS technologies. These are related terms that describe a DNA sequencing technology which has revolutionized genomic research. Using NGS, an entire human genome can be sequenced within a single day. Conclusion: Analysis of NGS data unravels important clues in the quest for the treatment of various lifethreatening diseases and other related scientific problems related to human welfare.


Genetics ◽  
2001 ◽  
Vol 157 (4) ◽  
pp. 1749-1757 ◽  
Author(s):  
Zhukuan Cheng ◽  
Gernot G Presting ◽  
C Robin Buell ◽  
Rod A Wing ◽  
Jiming Jiang

AbstractLarge-scale physical mapping has been a major challenge for plant geneticists due to the lack of techniques that are widely affordable and can be applied to different species. Here we present a physical map of rice chromosome 10 developed by fluorescence in situ hybridization (FISH) mapping of bacterial artificial chromosome (BAC) clones on meiotic pachytene chromosomes. This physical map is fully integrated with a genetic linkage map of rice chromosome 10 because each BAC clone is anchored by a genetically mapped restriction fragment length polymorphism marker. The pachytene chromosome-based FISH mapping shows a superior resolving power compared to the somatic metaphase chromosome-based methods. The telomere-centromere orientation of DNA clones separated by 40 kb can be resolved on early pachytene chromosomes. Genetic recombination is generally evenly distributed along rice chromosome 10. However, the highly heterochromatic short arm shows a lower recombination frequency than the largely euchromatic long arm. Suppression of recombination was found in the centromeric region, but the affected region is far smaller than those reported in wheat and barley. Our FISH mapping effort also revealed the precise genetic position of the centromere on chromosome 10.


2019 ◽  
Author(s):  
Ron Hübler ◽  
Felix M. Key ◽  
Christina Warinner ◽  
Kirsten I. Bos ◽  
Johannes Krause ◽  
...  

AbstractHigh-throughput DNA sequencing enables large-scale metagenomic analyses of complex biological systems. Such analyses are not restricted to present day environmental or clinical samples, but can also be fruitfully applied to molecular data from archaeological remains (ancient DNA), and a focus on ancient bacteria can provide valuable information on the long-term evolutionary relationship between hosts and their pathogens. Here we present HOPS (HeuristicOperations forPathogenScreening), an automated bacterial screening pipeline for ancient DNA sequence data that provides straightforward and reproducible information on species identification and authenticity. HOPS provides a versatile and fast pipeline for high-throughput screening of bacterial DNA from archaeological material to identify candidates for subsequent genomic-level analyses.


2020 ◽  
Vol 36 (12) ◽  
pp. 3841-3848
Author(s):  
Michael Gruenstaeudl

Abstract Motivation The submission of annotated sequence data to public sequence databases constitutes a central pillar in biological research. The surge of novel DNA sequences awaiting database submission due to the application of next-generation sequencing has increased the need for software tools that facilitate bulk submissions. This need has yet to be met with the concurrent development of tools to automate the preparatory work preceding such submissions. Results The author introduce annonex2embl, a Python package that automates the preparation of complete sequence flatfiles for large-scale sequence submissions to the European Nucleotide Archive. The tool enables the conversion of DNA sequence alignments that are co-supplied with sequence annotations and metadata to submission-ready flatfiles. Among other features, the software automatically accounts for length differences among the input sequences while maintaining correct annotations, automatically interlaces metadata to each record and displays a design suitable for easy integration into bioinformatic workflows. As proof of its utility, annonex2embl is employed in preparing a dataset of more than 1500 fungal DNA sequences for database submission. Availability and implementation annonex2embl is freely available via the Python package index at http://pypi.python.org/pypi/annonex2embl. Supplementary information Supplementary data are available at Bioinformatics online.


2011 ◽  
Vol 41 (No. 4) ◽  
pp. 153-159
Author(s):  
R.K. Varshney ◽  
U. Hähnel ◽  
T. Thiel ◽  
N. Stein ◽  
L. Altschmied ◽  
...  

Due to the availability of sequence data from large-scale EST (expressed sequence tag) projects, it has become feasible to develop microsatellite or simple sequence repeat (SSR) markers from genes. A set of 111 090 barley ESTs (corresponding to 55.9 Mb of sequence) was employed for the identification of microsatellites with the help of a PERL5 script called MISA. As a result, a total of 9 564 microsatellites were identified in 8 766 ESTs (SSR-ESTs). Cluster analysis revealed the presence of 2 823 non-redundant SSR-ESTs in this set. From these 754 primer pairs were designed and analysed in a set of seven genotypes including the parents of three mapping populations. Finally, 185 microsatellite (EST-SSRs) loci were placed onto the barley genetic map. These markers show a uniform distribution on all the linkage groups ranging from 21 markers (on 7H) to 35 markers (3H). The polymorphism information content (PIC) for the developed markers ranged from 0.24 to 0.78 with an average of 0.48. For the assignment of these markers to BAC clones, a PCR-based strategy was established to screen the “Morex”-BAC library. By using this strategy BAC addresses were obtained for a total of 127 mapped EST-SSRs, which may provide at least two markers located on a single BAC. This observation is indicative of an uneven distribution of genes and may lead to the identification of gene-rich regions in the barley genome.  


2021 ◽  
Author(s):  
Nicolas J. Rawlence ◽  
Alexander T. Salis ◽  
Hamish G. Spencer ◽  
Jonathan M. Waters ◽  
Lachie Scarsbrook ◽  
...  

ABSTRACTAimUnderstanding how wild populations respond to climatic shifts is a fundamental goal of biological research in a fast-changing world. The Southern Ocean represents a fascinating system for assessing large-scale climate-driven biological change, as it contains extremely isolated island groups within a predominantly westerly, circumpolar wind and current system. The blue-eyed shags (Leucocarbo spp.) represent a paradoxical Southern Ocean seabird radiation; a circumpolar distribution implies strong dispersal capacity yet their speciose nature suggests local adaptation and isolation. Here we use genetic tools in an attempt to resolve this paradox.LocationSouthern Ocean.Taxa17 species and subspecies of blue-eyed shags (Leucocarbo spp.) across the geographical distribution of the genus.MethodsHere we use mitochondrial and nuclear sequence data to conduct the first global genetic analysis of this group using a temporal phylogenetic framework to test for rapid speciation.ResultsOur analysis reveals remarkably shallow evolutionary histories among island-endemic lineages, consistent with a recent high-latitude circumpolar radiation. This rapid sub-Antarctic expansion contrasts with significantly deeper lineages detected in more temperate regions such as South America and New Zealand that may have acted as glacial refugia. The dynamic history of high-latitude expansions is further supported by ancestral demographic and biogeographic reconstructions.Main conclusionsThe circumpolar distribution of blue-eyed shags, and their highly dynamic evolutionary history, potentially make Leucocarbo a strong sentinel of past and ongoing Southern Ocean ecosystem change given their sensitivity to climatic impacts.


1999 ◽  
Vol 9 (8) ◽  
pp. 763-774 ◽  
Author(s):  
Yicheng Cao ◽  
Hyung Lyun Kang ◽  
Xuequn Xu ◽  
Mei Wang ◽  
So Hee Dho ◽  
...  

We have constructed a complete coverage BAC contig map that spans a 12-Mb genomic segment in the human chromosome 16p13.1–p11.2 region. The map consists of 68 previously mapped STSs and 289 BAC clones, 51 of which—corresponding to a total of 7.721 Mb of genomic DNA—have been sequenced, and provides a high resolution physical map of the region. Contigs were initially built based mainly on the analysis of STS contents and restriction fingerprint patterns of the clones. To close the gaps, probes derived from BAC clone ends were used to screen deeper BAC libraries. Clone end sequence data obtained from chromosome 16-specific BACs, as well as from public databases, were used for the identification of BACs that overlap with fully sequenced BACs by means of sequence match. This approach allowed precise alignment of clone overlaps in addition to restriction fingerprint comparison. A freehand contig drawing software tool was developed and used to manage the map data graphically and generate a real scale physical map. The map we present here is ∼3.5 × deep and provides a minimal tiling path that covers the region in an array of contigous, overlapping BACs.


2019 ◽  
Author(s):  
Michael Gruenstaeudl

ABSTRACTMotivationThe submission of annotated sequence data to public sequence databases constitutes a central pillar in biological research. The surge of novel DNA sequences awaiting database submission due to the application of next-generation sequencing has increased the need for software tools that facilitate bulk submissions. This need has yet to be met with a concurrent development of tools to automate the preparatory work preceding such submissions.ResultsI introduce annonex2embl, a Python package that automates the preparation of complete sequence flatfiles for large-scale sequence submissions to the European Nucleotide Archive. The tool enables the conversion of DNA sequence alignments that are co-supplied with sequence annotations and metadata to submission-ready flatfiles. Among other features, the software automatically accounts for length differences among the input sequences while maintaining correct annotations, automatically interlaces metadata to each record, and displays a design suitable for easy integration into bioinformatic workflows. As proof of its utility, annonex2embl is employed in preparing a dataset of more than 1,500 fungal DNA sequences for database submission.


2012 ◽  
Vol 13 (1) ◽  
pp. 40-53 ◽  
Author(s):  
Sarah K. Highlander

AbstractAnalysis of microbial communities using high throughput sequencing methods began in the mid 2000s permitting the production of 1000s to 10,000s of sequence reads per sample and megabases of data per sequence run. This then unprecedented depth of sequencing allowed, for the first time, the discovery of the ‘rare biosphere’ in environmental samples. The technology was quickly applied to studies in several human subjects. Perhaps these early studies served as a reminder that though the microbes that inhabit mammals are known to outnumber host cells by an order of magnitude or more, most of these are unknown members of our second genome, or microbiome (as coined by Joshua Lederberg), because of our inability to culture them. High throughput methods for microbial 16S ribosomal RNA gene and whole genome shotgun (WGS) sequencing have now begun to reveal the composition and identity of archaeal, bacterial and viral communities at many sites, in and on the human body. Surveys of the microbiota of food production animals have been published in the past few years and future studies should benefit from protocols and tools developed from large-scale human microbiome studies. Nevertheless, production animal-related resources, such as improved host genome assemblies and increased numbers and diversity of host-specific microbial reference genome sequences, will be needed to permit meaningful and robust analysis of 16S rDNA and WGS sequence data.


1999 ◽  
Vol 9 (2) ◽  
pp. 150-157 ◽  
Author(s):  
Douglas Vollrath ◽  
Virna L. Jaramillo-Babb

Human chromosomal region 1q24 encodes two cloned disease genes and lies within large genetic inclusion intervals for several disease genes that have yet to be identified. We have constructed a single bacterial artificial chromosome (BAC) clone contig that spans over 2 Mb of 1q24 and consists of 78 clones connected by 100 STSs. The average density of mapped STSs is one of the highest described for a multimegabase region of the human genome. The contig was efficiently constructed by generating STSs from clone ends, followed by library walking. Distance information was added by determining the insert sizes of all clones, and expressed sequence tags (ESTs) and genes were incorporated to create a partial transcript map of the region, providing candidate genes for local disease loci. The gene order and content of the region provide insight into ancient duplication events that have occurred on proximal 1q. The stage is now set for further elucidation of this interesting region through large-scale sequencing.[The sequence data described in this paper have been submitted to GenBank under accession nos. G42259–G42312 and G42330–G42335.]


Sign in / Sign up

Export Citation Format

Share Document