SeedEx: A Genome Sequencing Accelerator for Optimal Alignments in Subminimal Space

Author(s):  
Daichi Fujiki ◽  
Shunhao Wu ◽  
Nathan Ozog ◽  
Kush Goliya ◽  
David Blaauw ◽  
...  
Plants ◽  
2019 ◽  
Vol 8 (8) ◽  
pp. 270 ◽  
Author(s):  
Yun Gyeong Lee ◽  
Sang Chul Choi ◽  
Yuna Kang ◽  
Kyeong Min Kim ◽  
Chon-Sik Kang ◽  
...  

The whole genome sequencing (WGS) has become a crucial tool in understanding genome structure and genetic variation. The MinION sequencing of Oxford Nanopore Technologies (ONT) is an excellent approach for performing WGS and it has advantages in comparison with other Next-Generation Sequencing (NGS): It is relatively inexpensive, portable, has simple library preparation, can be monitored in real-time, and has no theoretical limits on reading length. Sorghum bicolor (L.) Moench is diploid (2n = 2x = 20) with a genome size of about 730 Mb, and its genome sequence information is released in the Phytozome database. Therefore, sorghum can be used as a good reference. However, plant species have complex and large genomes when compared to animals or microorganisms. As a result, complete genome sequencing is difficult for plant species. MinION sequencing that produces long-reads can be an excellent tool for overcoming the weak assembly of short-reads generated from NGS by minimizing the generation of gaps or covering the repetitive sequence that appears on the plant genome. Here, we conducted the genome sequencing for S. bicolor cv. BTx623 while using the MinION platform and obtained 895,678 reads and 17.9 gigabytes (Gb) (ca. 25× coverage of reference) from long-read sequence data. A total of 6124 contigs (covering 45.9%) were generated from Canu, and a total of 2661 contigs (covering 50%) were generated from Minimap and Miniasm with a Racon through a de novo assembly using two different tools and mapped assembled contigs against the sorghum reference genome. Our results provide an optimal series of long-read sequencing analysis for plant species while using the MinION platform and a clue to determine the total sequencing scale for optimal coverage that is based on various genome sizes.


2019 ◽  
Vol 9 (10) ◽  
pp. 3213-3223 ◽  
Author(s):  
Giovanna Cáceres ◽  
María E. López ◽  
María I. Cádiz ◽  
Grazyella M. Yoshida ◽  
Ana Jedlicki ◽  
...  

Nile tilapia (Oreochromis niloticus) is one of the most cultivated and economically important species in world aquaculture. Intensive production promotes the use of monosex animals, due to an important dimorphism that favors male growth. Currently, the main mechanism to obtain all-male populations is the use of hormones in feeding during larval and fry phases. Identifying genomic regions associated with sex determination in Nile tilapia is a research topic of great interest. The objective of this study was to identify genomic variants associated with sex determination in three commercial populations of Nile tilapia. Whole-genome sequencing of 326 individuals was performed, and a total of 2.4 million high-quality bi-allelic single nucleotide polymorphisms (SNPs) were identified after quality control. A genome-wide association study (GWAS) was conducted to identify markers associated with the binary sex trait (males = 1; females = 0). A mixed logistic regression GWAS model was fitted and a genome-wide significant signal comprising 36 SNPs, spanning a genomic region of 536 kb in chromosome 23 was identified. Ten out of these 36 genetic variants intercept the anti-Müllerian (Amh) hormone gene. Other significant SNPs were located in the neighboring Amh gene region. This gene has been strongly associated with sex determination in several vertebrate species, playing an essential role in the differentiation of male and female reproductive tissue in early stages of development. This finding provides useful information to better understand the genetic mechanisms underlying sex determination in Nile tilapia.


2020 ◽  
Author(s):  
Alexander Smetanin ◽  
Nikita Moshkov ◽  
Tatiana V. Tatarinova

AbstractSummaryWe developed PyLAE - a new tool for determining local ancestry along a genome using whole-genome sequencing data or high-density genotyping experiments. PyLAE can process an arbitrarily large number of ancestral populations (with or without an informative prior). Since PyLAE does not involve estimation of many parameters, it can process thousands of genomes within a day. Computational efficiency, straightforward presentation of results, and an ease of installation makes PyLAE a useful tool to study admixed populations.Availability and implementationThe source code and installation manual are available at https://github.com/smetam/pylae.


2021 ◽  
Author(s):  
Severin Einspanier ◽  
Tamara Susanto ◽  
Nicole Metz ◽  
Pieter J. Wolters ◽  
Vivianne G.A.A. Vleeshouwers ◽  
...  

Early blight of potato is caused by the fungal pathogen Alternaria solani and is an increasing problem worldwide. The primary strategy to control the disease is applying fungicides such as succinate dehydrogenase inhibitors (SDHI). SDHI-resistant strains, showing reduced sensitivity to treatments, appeared in Germany in 2013, five years after introduction of SDHIs. Two primary mutations in the Sdh complex (SdhB-H278Y and SdhC-H134R) have been frequently found throughout Europe. How these resistances arose and spread, and whether they are linked to other genomic features, remains unknown. We performed whole-genome sequencing for A. solani isolates from potato fields across Europe (Germany, Sweden, Belgium, and Serbia) to better understand the pathogen's genetic diversity in general and understand the development and spread of the genetic mutations that lead to SDHI resistance. We used ancestry analysis and phylogenetics to determine the genetic background of 48 isolates. The isolates can be grouped into 7 genotypes. These genotypes do not show a geographical pattern but appear spread throughout Europe. The Sdh mutations appear in different genetic backgrounds, suggesting they arose independently, and the observed admixtures might indicate a higher adaptive potential in the fungus than previously thought. Our research gives insights into the genetic diversity of A. solani on a genome level. The mixed occurrence of different genotypes and apparent admixture in the populations indicate higher genomic complexity than anticipated. The conclusion that SDHI tolerance arose multiple times independently has important implications for future fungicide resistance management strategies. These should not solely focus on preventing the spread of isolates between locations but also on limiting population size and the selective pressure posed by fungicides in a given field to avoid the rise of new mutations in other genetic backgrounds.


2019 ◽  
Vol 23 (1) ◽  
pp. 38-48 ◽  
Author(s):  
M. K. Bragina ◽  
D. A. Afonnikov ◽  
E. A. Salina

Since the first plant genome of Arabidopsis thaliana has been sequenced and published, genome sequencing technologies have undergone significant changes. New algorithms, sequencing technologies and bioinformatic approaches were adopted to obtain genome, transcriptome and exome sequences for model and crop species, which have permitted deep inferences into plant biology. As a result of an improved genome assembly and analysis methods, genome sequencing costs plummeted and the number of high-quality plant genome sequences is constantly growing. Consequently, more than 300 plant genome sequences have been published over the past twenty years. Although many of the published genomes are considered incomplete, they proved to be a valuable tool for identifying genes involved in the formation of economically valuable plant traits, for marker-assisted and genomic selection and for comparative analysis of plant genomes in order to determine the basic patterns of origin of various plant species. Since a high coverage and resolution of a genome sequence is not enough to detect all changes in complex samples, targeted sequencing, which consists in the isolation and sequencing of a specific region of the genome, has begun to develop. Targeted sequencing has a higher detection power (the ability to identify new differences/variants) and resolution (up to one basis). In addition, exome sequencing (the method of sequencing only protein-coding genes regions) is actively developed, which allows for the sequencing of non-expressed alleles and genes that cannot be found with RNA-seq. In this review, an analysis of sequencing technologies development and the construction of “reference” genomes of plants is performed. A comparison of the methods of targeted sequencing based on the use of the reference DNA sequence is accomplished.


2021 ◽  
Vol 12 ◽  
Author(s):  
Annika Brinkmann ◽  
Sophie-Luisa Ulm ◽  
Steven Uddin ◽  
Sophie Förster ◽  
Dominique Seifert ◽  
...  

Since the emergence of the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) in December 2019, the scientific community has been sharing data on epidemiology, diagnostic methods, and whole-genomic sequences almost in real time. The latter have already facilitated phylogenetic analyses, transmission chain tracking, protein modeling, the identification of possible therapeutic targets, timely risk assessment, and identification of novel variants. We have established and evaluated an amplification-based approach for whole-genome sequencing of SARS-CoV-2. It can be used on the miniature-sized and field-deployable sequencing device Oxford Nanopore MinION, with sequencing library preparation time of 10 min. We show that the generation of 50,000 total reads per sample is sufficient for a near complete coverage (>90%) of the SARS-CoV-2 genome directly from patient samples even if virus concentration is low (Ct 35, corresponding to approximately 5 genome copies per reaction). For patient samples with high viral load (Ct 18–24), generation of 50,000 reads in 1–2 h was shown to be sufficient for a genome coverage of >90%. Comparison to Illumina data reveals an accuracy that suffices to identify virus mutants. AmpliCoV can be applied whenever sequence information on SARS-CoV-2 is required rapidly, for instance for the identification of circulating virus mutants.


2019 ◽  
Author(s):  
David W Eyre ◽  
Tim EA Peto ◽  
Derrick W Crook ◽  
A Sarah Walker ◽  
Mark H Wilcox

AbstractBackgroundPathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely-related genomes among a background of thousands of other genomes is challenging.MethodsWe describe a refinement to core-genome multi-locus sequence typing (cgMLST) where alleles at each gene are reproducibly converted to a unique hash, or short string of letters (hash-cgMLST). This avoids the resource-intensive need for a single centralised database of sequentially-numbered alleles. We test the reproducibility and discriminatory power of cgMLST/hash-cgMLST compared to mapping-based approaches in Clostridium difficile using repeated sequencing of the same isolates (replicates) and data from consecutive infection isolates from six English hospitals.ResultsHash-cgMLST provided the same results as standard cgMLST with minimal performance penalty. Comparing 272 pairs of replicate sequences, using reference-based mapping there were 0, 1 or 2 SNPs between 262(96%), 5(2%) and 1(<1%) pairs respectively. Using hash-cgMLST or standard cgMLST, 197(72%) replicate pairs had zero gene differences, 37(14%), 8(3%) and 30(11%) pairs had 1, 2 and >2 differences respectively. False gene differences were clustered in specific genes and associated with fragmented assemblies. Considering 413 pairs of infections within ≤2 SNPS, i.e. consistent with recent transmission, 266(64%) had ≤2 gene differences and 50(12%) ≥5 differences. Comparing a genome to 100,000 others took <1 minute using hash-cgMLST.ConclusionHash-cgMLST is an effective surveillance tool that can rapidly identify clusters of related genomes. However, cgMLST/hash-cgMLST generates potentially more false variants than mapping-based analysis. Refined mapping-based variant calling is likely required to precisely define close genetic relationships.


2021 ◽  
Author(s):  
Xiaoming Song ◽  
Yanping Wei ◽  
Dong Xiao ◽  
Ke Gong ◽  
Pengchuan Sun ◽  
...  

Abstract Ethiopian mustard (Brassica carinata) in the Brassicaceae family possesses many excellent agronomic traits. Here, the high-quality genome sequence of B. carinata is reported. Characterization revealed a genome anchored to 17 chromosomes with a total length of 1.087 Gb and an N50 scaffold length of 60 Mb. Repetitive sequences account for approximately 634 Mb or 58.34% of the B. carinata genome. Notably, 51.91% of 97,149 genes are confined to the terminal 20% of chromosomes as a result of the expansion of repeats in pericentromeric regions. Brassica carinata shares one whole-genome triplication event with the five other species in U’s triangle, a classic model of evolution and polyploidy in Brassica. Brassica carinata was deduced to have formed ∼0.047 Mya, which is slightly earlier than B. napus but later than B. juncea. Our analysis indicated that the relationship between the two subgenomes (BcaB and BcaC) is greater than that between other two tetraploid subgenomes (BjuB and BnaC) and their respective diploid parents. RNA-seq datasets and comparative genomic analysis were used to identify several key genes in pathways regulating disease resistance and glucosinolate metabolism. Further analyses revealed that genome triplication and tandem duplication played important roles in the expansion of those genes in Brassica species. With the genome sequencing of B. carinata completed, the genomes of all six Brassica species in U’s triangle are now resolved. The data obtained from genome sequencing, transcriptome analysis, and comparative genomic efforts in this study provide valuable insights into the genome evolution of the six Brassica species in U’s triangle.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Rueben G. Das ◽  
Doreen Becker ◽  
Vidhya Jagannathan ◽  
Orly Goldstein ◽  
Evelyn Santana ◽  
...  

Abstract Congenital stationary night blindness (CSNB), in the complete form, is caused by dysfunctions in ON-bipolar cells (ON-BCs) which are secondary neurons of the retina. We describe the first disease causative variant associated with CSNB in the dog. A genome-wide association study using 12 cases and 11 controls from a research colony determined a 4.6 Mb locus on canine chromosome 32. Subsequent whole-genome sequencing identified a 1 bp deletion in LRIT3 segregating with CSNB. The canine mutant LRIT3 gives rise to a truncated protein with unaltered subcellular expression in vitro. Genetic variants in LRIT3 have been associated with CSNB in patients although there is limited evidence regarding its apparently critical function in the mGluR6 pathway in ON-BCs. We determine that in the canine CSNB retina, the mutant LRIT3 is correctly localized to the region correlating with the ON-BC dendritic tips, albeit with reduced immunolabelling. The LRIT3-CSNB canine model has direct translational potential enabling studies to help understand the CSNB pathogenesis as well as to develop new therapies targeting the secondary neurons of the retina.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
E. A. Hisey ◽  
H. Hermans ◽  
Z. T. Lounsberry ◽  
F. Avila ◽  
R. A. Grahn ◽  
...  

Abstract Background Distichiasis, an ocular disorder in which aberrant cilia (eyelashes) grow from the opening of the Meibomian glands of the eyelid, has been reported in Friesian horses. These misplaced cilia can cause discomfort, chronic keratitis, and corneal ulceration, potentially impacting vision due to corneal fibrosis, or, if secondary infection occurs, may lead to loss of the eye. Friesian horses represent the vast majority of reported cases of equine distichiasis, and as the breed is known to be affected with inherited monogenic disorders, this condition was hypothesized to be a simply inherited Mendelian trait. Results A genome wide association study (GWAS) was performed using the Axiom 670 k Equine Genotyping array (MNEc670k) utilizing 14 cases and 38 controls phenotyped for distichiasis. An additive single locus mixed linear model (EMMAX) approach identified a 1.83 Mb locus on ECA5 and a 1.34 Mb locus on ECA13 that reached genome-wide significance (pcorrected = 0.016 and 0.032, respectively). Only the locus on ECA13 withstood replication testing (p = 1.6 × 10− 5, cases: n = 5 and controls: n = 37). A 371 kb run of homozygosity (ROH) on ECA13 was found in 13 of the 14 cases, providing evidence for a recessive mode of inheritance. Haplotype analysis (hapQTL) narrowed the region of association on ECA13 to 163 kb. Whole-genome sequencing data from 3 cases and 2 controls identified a 16 kb deletion within the ECA13 associated haplotype (ECA13:g.178714_195130del). Functional annotation data supports a tissue-specific regulatory role of this locus. This deletion was associated with distichiasis, as 18 of the 19 cases were homozygous (p = 4.8 × 10− 13). Genotyping the deletion in 955 horses from 54 different breeds identified the deletion in only 11 non-Friesians, all of which were carriers, suggesting that this could be causal for this Friesian disorder. Conclusions This study identified a 16 kb deletion on ECA13 in an intergenic region that was associated with distichiasis in Friesian horses. Further functional analysis in relevant tissues from cases and controls will help to clarify the precise role of this deletion in normal and abnormal eyelash development and investigate the hypothesis of incomplete penetrance.


Sign in / Sign up

Export Citation Format

Share Document