Faculty Opinions recommendation of MinION-based long-read sequencing and assembly extends the Caenorhabditis elegans reference genome.

Author(s):  
Charles Baer
2017 ◽  
Vol 28 (2) ◽  
pp. 266-274 ◽  
Author(s):  
John R. Tyson ◽  
Nigel J. O'Neil ◽  
Miten Jain ◽  
Hugh E. Olsen ◽  
Philip Hieter ◽  
...  

Genes ◽  
2018 ◽  
Vol 9 (10) ◽  
pp. 500
Author(s):  
Juan A. Subirana ◽  
Xavier Messeguer

Repetitive genome regions have been difficult to sequence, mainly because of the comparatively small size of the fragments used in assembly. Satellites or tandem repeats are very abundant in nematodes and offer an excellent playground to evaluate different assembly methods. Here, we compare the structure of satellites found in three different assemblies of the Caenorhabditis elegans genome: the original sequence obtained by Sanger sequencing, an assembly based on PacBio technology, and an assembly using Nanopore sequencing reads. In general, satellites were found in equivalent genomic regions, but the new long-read methods (PacBio and Nanopore) tended to result in longer assembled satellites. Important differences exist between the assemblies resulting from the two long-read technologies, such as the sizes of long satellites. Our results also suggest that the lengths of some annotated genes with internal repeats which were assembled using Sanger sequencing are likely to be incorrect.


2021 ◽  
pp. gr.275325.121
Author(s):  
Rodrigo P. Baptista ◽  
Yiran Li ◽  
Adam Sateriale ◽  
Karen L. Brooks ◽  
Alan Tracey ◽  
...  

Cryptosporidiosis is a leading cause of waterborne diarrheal disease globally and an important contributor to mortality in infants and the immunosuppressed. Despite its importance, the Cryptosporidium community has only had access to a good, but incomplete, Cryptosporidium parvum IOWA reference genome sequence. Incomplete reference sequences hamper annotation, experimental design and interpretation. We have generated a new C. parvum IOWA genome assembly supported by PacBio and Oxford Nanopore long-read technologies and a new comparative and consistent genome annotation for three closely related species C. parvum, Cryptosporidium hominis and Cryptosporidium tyzzeri. We made 1,926 C. parvum annotation updates based on experimental evidence. They include new transporters, ncRNAs, introns and altered gene structures. The new assembly and annotation revealed a complete Dnmt2 methylase ortholog. Comparative annotation between C. parvum, C. hominis and C. tyzzeri revealed that most "missing" orthologs are found suggesting that the biological differences between the species must result from gene copy number variation, differences in gene regulation and single nucleotide variants (SNVs). Using the new assembly and annotation as reference, 190 genes are identified as evolving under positive selection, including many not detected previously. The new C. parvum IOWA reference genome assembly is larger, gap free and lacks ambiguous bases. This chromosomal assembly recovers all 16 chromosome ends, 13 of which are contiguously assembled. The three remaining chromosome ends are provisionally placed. These ends represent duplication of entire chromosome ends including subtelomeric regions revealing a new level of genome plasticity that will both inform and impact future research.


Plants ◽  
2019 ◽  
Vol 8 (8) ◽  
pp. 270 ◽  
Author(s):  
Yun Gyeong Lee ◽  
Sang Chul Choi ◽  
Yuna Kang ◽  
Kyeong Min Kim ◽  
Chon-Sik Kang ◽  
...  

The whole genome sequencing (WGS) has become a crucial tool in understanding genome structure and genetic variation. The MinION sequencing of Oxford Nanopore Technologies (ONT) is an excellent approach for performing WGS and it has advantages in comparison with other Next-Generation Sequencing (NGS): It is relatively inexpensive, portable, has simple library preparation, can be monitored in real-time, and has no theoretical limits on reading length. Sorghum bicolor (L.) Moench is diploid (2n = 2x = 20) with a genome size of about 730 Mb, and its genome sequence information is released in the Phytozome database. Therefore, sorghum can be used as a good reference. However, plant species have complex and large genomes when compared to animals or microorganisms. As a result, complete genome sequencing is difficult for plant species. MinION sequencing that produces long-reads can be an excellent tool for overcoming the weak assembly of short-reads generated from NGS by minimizing the generation of gaps or covering the repetitive sequence that appears on the plant genome. Here, we conducted the genome sequencing for S. bicolor cv. BTx623 while using the MinION platform and obtained 895,678 reads and 17.9 gigabytes (Gb) (ca. 25× coverage of reference) from long-read sequence data. A total of 6124 contigs (covering 45.9%) were generated from Canu, and a total of 2661 contigs (covering 50%) were generated from Minimap and Miniasm with a Racon through a de novo assembly using two different tools and mapped assembled contigs against the sorghum reference genome. Our results provide an optimal series of long-read sequencing analysis for plant species while using the MinION platform and a clue to determine the total sequencing scale for optimal coverage that is based on various genome sizes.


2020 ◽  
Vol 10 (8) ◽  
pp. 2801-2809 ◽  
Author(s):  
Tingting Zhao ◽  
Zhongqu Duan ◽  
Georgi Z. Genchev ◽  
Hui Lu

Despite continuous updates of the human reference genome, there are still hundreds of unresolved gaps which account for about 5% of the total sequence length. Given the availability of whole genome de novo assemblies, especially those derived from long-read sequencing data, gap-closing sequences can be determined. By comparing 17 de novo long-read sequencing assemblies with the human reference genome, we identified a total of 1,125 gap-closing sequences for 132 (16.9% of 783) gaps and added up to 2.2 Mb novel sequences to the human reference genome. More than 90% of the non-redundant sequences could be verified by unmapped reads from the Simons Genome Diversity Project dataset. In addition, 15.6% of the non-reference sequences were found in at least one of four non-human primate genomes. We further demonstrated that the non-redundant sequences had high content of simple repeats and satellite sequences. Moreover, 43 (32.6%) of the 132 closed gaps were shown to be polymorphic; such sequences may play an important biological role and can be useful in the investigation of human genetic diversity.


2018 ◽  
Vol 8 (10) ◽  
pp. 3143-3154 ◽  
Author(s):  
Edwin A. Solares ◽  
Mahul Chakraborty ◽  
Danny E. Miller ◽  
Shannon Kalsow ◽  
Kate Hall ◽  
...  

2020 ◽  
Author(s):  
Yuxuan Yuan ◽  
Philipp E. Bayer ◽  
Robyn Anderson ◽  
HueyTyng Lee ◽  
Chon-Kit Kenneth Chan ◽  
...  

AbstractRecent advances in long-read sequencing have the potential to produce more complete genome assemblies using sequence reads which can span repetitive regions. However, overlap based assembly methods routinely used for this data require significant computing time and resources. Here, we have developed RefKA, a reference-based approach for long read genome assembly. This approach relies on breaking up a closely related reference genome into bins, aligning k-mers unique to each bin with PacBio reads, and then assembling each bin in parallel followed by a final bin-stitching step. During benchmarking, we assembled the wheat Chinese Spring (CS) genome using publicly available PacBio reads in parallel in 168 wall hours on a 250 CPU system. The maximum RAM used was 300 Gb and the computing time was 42,000 CPU hours. The approach opens applications for the assembly of other large and complex genomes with much-reduced computing requirements. The RefKA pipeline is available at https://github.com/AppliedBioinformatics/RefKA


2016 ◽  
Author(s):  
Eric Disdero ◽  
Jonathan Filée

AbstractMotivationPopulation genomic analysis of transposable elements has greatly benefited from recent advances of sequencing technologies. However, the propensity of transposable elements to nest in highly repeated regions of genomes limits the efficiency of bioinformatic tools when short read sequences technology is used.ResultsLoRTE is the first tool able to use PacBio long read sequences to identify transposon deletions and insertions between a reference genome and genomes of different strains or populations. Tested against Drosophila melanogaster PacBio datasets, LoRTE appears to be a reliable and broadly applicable tools to study the dynamic and evolutionary impact of transposable elements using low coverage, long read sequences.Availability and ImplementationLoRTE is available at http://www.egce.cnrs-gif.fr/?p=6422. It is written in Python 2.7 and only requires the NCBI BLAST + package. LoRTE can be used on standard computer with limited RAM resources and reasonable running time even with large [email protected]


Author(s):  
Marion Claudia Müller ◽  
Lukas Kunz ◽  
Johannes Peter Graf ◽  
Seraina Schudel ◽  
Beat Keller

The emergence of new fungal pathogens through hybridization represents a serious challenge for agriculture. Hybridization between the wheat mildew (Blumeria graminis f.sp. tritici) and rye mildew (B.g. f.sp. secalis) pathogens have led to the emergence of a new mildew form (B.g. f.sp. triticale) growing on triticale, a man-made amphiploid crop derived from crossing rye and wheat which was originally resistant to the powdery mildew disease. The identification of the genetic basis of host-adaptation in triticale mildew has been hampered by the lack of a reference genome. Here we report the 141.4 Mb reference assembly of triticale mildew isolate THUN-12 derived from long-read sequencing and genetic map-based scaffolding. All eleven triticale mildew chromosomes were assembled from telomere-to-telomere and revealed that 19.7% of the hybrid genome was inherited from the rye mildew parental lineage. We identified lineage-specific regions in the hybrid, inherited from the rye or wheat mildew parental lineages, that harbour numerous bona fide candidate effectors. We propose that the combination of lineage-specific effectors in the hybrid genome is crucial for host-adaptation, allowing the fungus to simultaneously circumvent the immune systems contributed by wheat and rye in the triticale crop. In line with this we demonstrate the functional transfer of the SvrPm3 effector from wheat to triticale mildew, a virulence effector that specifically suppresses resistance of the wheat Pm3 allelic series. This transfer is the likely underlying cause for the observed poor effectiveness of several Pm3 alleles against triticale mildew and exemplifies the negative implications of pathogen hybridizations on resistance breeding.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 297 ◽  
Author(s):  
Jason R. Miller ◽  
Sergey Koren ◽  
Kari A. Dilley ◽  
Derek M. Harkins ◽  
Timothy B. Stockwell ◽  
...  

Background:The tick cell line ISE6, derived fromIxodes scapularis, is commonly used for amplification and detection of arboviruses in environmental or clinical samples.Methods:To assist with sequence-based assays, we sequenced the ISE6 genome with single-molecule, long-read technology.Results:The draft assembly appears near complete based on gene content analysis, though it appears to lack some instances of repeats in this highly repetitive genome. The assembly appears to have separated the haplotypes at many loci. DNA short read pairs, used for validation only, mapped to the cell line assembly at a higher rate than they mapped to theIxodes scapularisreference genome sequence.Conclusions:The assembly could be useful for filtering host genome sequence from sequence data obtained from cells infected with pathogens.


Sign in / Sign up

Export Citation Format

Share Document