short read sequence
Recently Published Documents


TOTAL DOCUMENTS

72
(FIVE YEARS 29)

H-INDEX

17
(FIVE YEARS 5)

2021 ◽  
Author(s):  
R. Alan Harris ◽  
Muthuswamy Raveendran ◽  
Dustin T Lyfoung ◽  
Fritz J Sedlazeck ◽  
Medhat Mahmoud ◽  
...  

Background The Syrian hamster (Mesocricetus auratus) has been suggested as a useful mammalian model for a variety of diseases and infections, including infection with respiratory viruses such as SARS-CoV-2. The MesAur1.0 genome assembly was published in 2013 using whole-genome shotgun sequencing with short-read sequence data. Current more advanced sequencing technologies and assembly methods now permit the generation of near-complete genome assemblies with higher quality and higher continuity. Findings Here, we report an improved assembly of the M. auratus genome (BCM_Maur_2.0) using Oxford Nanopore Technologies long-read sequencing to produce a chromosome-scale assembly. The total length of the new assembly is 2.46 Gbp, similar to the 2.50 Gbp length of a previous assembly of this genome, MesAur1.0. BCM_Maur_2.0 exhibits significantly improved continuity with a scaffold N50 that is 6.7 times greater than MesAur1.0. Furthermore, 21,616 protein coding genes and 10,459 noncoding genes were annotated in BCM_Maur_2.0 compared to 20,495 protein coding genes and 4,168 noncoding genes in MesAur1.0. This new assembly also improves the unresolved regions as measured by nucleotide ambiguities, where approximately 17.11% of bases in MesAur1.0 were unresolved compared to BCM_Maur_2.0 in which the number of unresolved bases is reduced to 3.00%. Conclusions Access to a more complete reference genome with improved accuracy and continuity will facilitate more detailed, comprehensive, and meaningful research results for a wide variety of future studies using Syrian hamsters as models.


2021 ◽  
Vol 31 (4) ◽  
pp. 51-60
Author(s):  
Vu Nhi Ha ◽  
Kieu Chi Thanh ◽  
Nguyen Thai Son ◽  
Dao Van Thang ◽  
Tran Huy Hoang

Acinetobacter baumannii (A. baumannii) is currently ranked as the frst concern for the development of new antibiotics due to its capacity of resistance to all available families of antibiotics. The most common mechanism of antibiotic resistance development in A. baumannii is through the acquisition of mobile genetic elements such as plasmid, transposon and integrons carrying resistance genes. A. baumannii strain TN81 was isolated from sputum specimen of a 45-year-old man at Thanh Nhan Hospital (Hanoi, Vietnam) and confrmed to be a multidrug resistance strain with high minimum inhibitory concentration value of 8/9 type of antibiotics, especially colistin. De novo assembly of the whole genome shotgun sequence of strain TN81 yielded an estimated genome size of 3,739,193 bp with 593 contigs and N50 is 9,126 bp. MLST analysis showed that TN81 belongs to ST164, which was frst reported as genome assembly in Vietnam. Resistance genes identifcation through database found that TN81 contained 12 genes encoding for antibiotic resistance. Notably, we performed de novo assembly of plasmid through short read sequence and identifed two potential plasmid-encoded antibiotic resistance genes (ant(2’’)-Ia / aadB and tet (39), which were reported for the first time as in ST164 group. This study aimed to investigate the plasmid-containing antibiotic resistance genes from a nosocomial isolate of Acinetobacter baumannii. Conclusively, all of these results would be crucial information on antibiotic resistance in A. baumannii in Vietnam.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yiqing Yan ◽  
Nimisha Chaturvedi ◽  
Raja Appuswamy

Abstract Background Improvements in sequencing technology continue to drive sequencing cost towards $100 per genome. However, mapping sequenced data to a reference genome remains a computationally-intensive task due to the dependence on edit distance for dealing with INDELs and mismatches introduced by sequencing. All modern aligners use seed–filter–extend methodology and rely on filtration heuristics to reduce the overhead of edit distance computation. However, filtering has inherent performance–accuracy trade-offs that limits its effectiveness. Results Motivated by algorithmic advances in randomized low-distortion embedding, we introduce SEE, a new methodology for developing sequence mappers and aligners. While SFE focuses on eliminating sub-optimal candidates, SEE focuses instead on identifying optimal candidates. To do so, SEE transforms the read and reference strings from edit distance regime to the Hamming regime by embedding them using a randomized algorithm, and uses Hamming distance over the embedded set to identify optimal candidates. To show that SEE performs well in practice, we present Accel-Align an SEE-based short-read sequence mapper and aligner that is 3–12$$\times$$ × faster than state-of-the-art aligners on commodity CPUs, without any special-purpose hardware, while providing comparable accuracy. Conclusions As sequencing technologies continue to increase read length while improving throughput and accuracy, we believe that randomized embeddings open up new avenues for optimization that cannot be achieved by using edit distance. Thus, the techniques presented in this paper have a much broader scope as they can be used for other applications like graph alignment, multiple sequence alignment, and sequence assembly.


2021 ◽  
Author(s):  
Adrien Leger ◽  
Ian Brettell ◽  
Jack Monahan ◽  
Carl Barton ◽  
Nadeshda Wolf ◽  
...  

The teleost medaka (Oryzias latipes) is a well-established vertebrate model system, with a long history of genetic research, and multiple high-quality reference genomes available for several inbred strains (HdrR, HNI and HSOK). Medaka has a high tolerance to inbreeding from the wild, thus allowing one to establish inbred lines from wild founder individuals. We have exploited this feature to create an inbred panel resource: the Medaka Inbred Kiyosu-Karlsruhe (MIKK) panel. This panel of 80 near-isogenic inbred lines contains a large amount of genetic variation inherited from the original wild population. We used Oxford Nanopore Technologies (ONT) long read data to further investigate the genomic and epigenomic landscapes of a subset of the MIKK panel. Nanopore sequencing allowed us to identify a much greater variety of high-quality structural variants compared with Illumina sequencing. We also present results and methods using a pan-genome graph representation of 12 individual medaka lines from the MIKK panel. This graph-based reference MIKK panel genome revealed novel differences between the MIKK panel lines compared to standard linear reference genomes. We found additional MIKK panel-specific genomic content that would be missing from linear reference alignment approaches. We were also able to identify and quantify the presence of repeat elements in each of the lines. Finally, we investigated line-specific CpG methylation and performed differential DNA methylation analysis across the 12 lines. We thus present a detailed analysis of the MIKK panel genomes using long and short read sequence technologies, creating a MIKK panel specific pan genome reference dataset allowing for the investigation of novel variation types that would be elusive using standard approaches.


2021 ◽  
Vol 12 ◽  
Author(s):  
Ryan Musich ◽  
Lance Cadle-Davidson ◽  
Michael V. Osier

Aligning short-read sequences is the foundational step to most genomic and transcriptomic analyses, but not all tools perform equally, and choosing among the growing body of available tools can be daunting. Here, in order to increase awareness in the research community, we discuss the merits of common algorithms and programs in a way that should be approachable to biologists with limited experience in bioinformatics. We will only in passing consider the effects of data cleanup, a precursor analysis to most alignment tools, and no consideration will be given to downstream processing of the aligned fragments. To compare aligners [Bowtie2, Burrows Wheeler Aligner (BWA), HISAT2, MUMmer4, STAR, and TopHat2], an RNA-seq dataset was used containing data from 48 geographically distinct samples of the grapevine powdery mildew fungus Erysiphe necator. Based on alignment rate and gene coverage, all aligners performed well with the exception of TopHat2, which HISAT2 superseded. BWA perhaps had the best performance in these metrics, except for longer transcripts (>500 bp) for which HISAT2 and STAR performed well. HISAT2 was ~3-fold faster than the next fastest aligner in runtime, which we consider a secondary factor in most alignments. At the end, this direct comparison of commonly used aligners illustrates key considerations when choosing which tool to use for the specific sequencing data and objectives. No single tool meets all needs for every user, and there are many quality aligners available.


2021 ◽  
Author(s):  
Ahmed Arslan ◽  
Zhuoqing Fang ◽  
Meiyue Wang ◽  
Zhuanfen Cheng ◽  
Boyoung Yoo ◽  
...  

AbstractThe genomes of six inbred strains were analyzed using long read (LR) sequencing. The results revealed that structural variants (SV) were very abundant within the genome of inbred mouse strains (4.8 per gene), which indicates that they could impact genetic traits. Analysis of the relationship between SNP and SV alleles across 53 inbred strains indicated that we have a very limited ability to infer whether SV are present using short read sequence data, even when nearby SNP alleles are known. The benefit of having a more complete map of the pattern of genetic variation was demonstrated by identifying at least three genetic factors that could underlie the unique neuroanatomic and behavioral features of BTBR mice that resemble human Autism Spectrum Disorder (ASD). Similar to the genetic findings in human ASD cohorts, the identified BTBR-unique alleles are very rare, and they cause high impact changes in genes that play a role in neurodevelopment and brain function.


Viruses ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 269
Author(s):  
Sijun Liu ◽  
Thomas W. Sappington ◽  
Brad S. Coates ◽  
Bryony C. Bonning

Analysis of pooled genomic short read sequence data revealed the presence of nudivirus-derived sequences from U.S. populations of both southern corn rootworm (SCR, Diabrotica undecimpunctata howardi Barber) and western corn rootworm (WCR, Diabrotica virgifera virgifera LeConte). A near complete nudivirus genome sequence was assembled from sequence data for an SCR population with relatively high viral titers. A total of 147,179 bp was assembled from five contigs that collectively encode 109 putative open reading frames (ORFs) including 20 nudivirus core genes. In contrast, genome sequence recovery was incomplete for a second nudivirus from WCR, although sequences derived from this virus were present in three geographically dispersed populations. Only 48,989 bp were assembled with 48 putative ORFs including 13 core genes, representing about 20% of a typical nudivirus genome. Phylogenetic analysis indicated that both corn rootworm nudiviruses grouped with the third known nudivirus of beetles, Oryctes rhinoceros nudivirus in the genus Alphanudivirus. On the basis of phylogenetic and additional analyses, we propose further taxonomic separation of nudiviruses within Alphanudivirus and Betanudivirus into two subfamilies and five genera. Identification of nudivirus-derived sequences from two species of corn rootworm highlights the diversity of viruses associated with these agricultural insect pests.


2021 ◽  
Vol 65 (4) ◽  
Author(s):  
Paweł Urbanowicz ◽  
Ibrahim Bitar ◽  
Radosław Izdebski ◽  
Anna Baraniak ◽  
Elżbieta Literacka ◽  
...  

ABSTRACT In 2003 to 2004, the first five VIM-2 metallo-β-lactamase (MBL)-producing Pseudomonas aeruginosa (MPPA) isolates with an In4-like integron, In461 (aadB-blaVIM-2-aadA6), on conjugative plasmids were identified in three hospitals in Poland. In 2005 to 2015, MPPA expanded much in the country, and as many as 80 isolates in a collection of 454 MPPA (∼18%) had In461, one of the two most common MBL-encoding integrons. The organisms occurred in 49 hospitals in 33 cities of 11/16 main administrative regions. Pulsed-field gel electrophoresis (PFGE) and multilocus sequence typing (MLST) classified them into 55 pulsotypes and 35 sequence types (STs), respectively, revealing their remarkable genetic diversity overall, with only a few small clonal clusters. S1 nuclease/hybridization assays and mating of 63 representative isolates showed that ∼85% of these had large In461-carrying plasmids, ∼350 to 550 kb, usually self-transmitting with high efficiency (∼10−1 to 10−2 per donor cell). The plasmids from 19 isolates were sequenced and subjected to structural and single-nucleotide-polymorphism (SNP)-based phylogenetic analysis. These formed a subgroup within a family of IncP-2-type megaplasmids, observed worldwide in pseudomonads from various environments and conferring resistance/tolerance to multiple stress factors, including antibiotics. Their microdiversity in Poland arose mainly from acquisition of different accessory fragments, as well as new resistance genes and multiplication of these. Short-read sequence and/or PCR mapping confirmed the In461-carrying plasmids in the remaining isolates to be the IncP-2 types. The study demonstrated a large-scale epidemic spread of multidrug resistance plasmids in P. aeruginosa populations, creating an epidemiological threat. It contributes to the knowledge on IncP-2 types, which are interesting research objects in resistance epidemiology, environmental microbiology, and biotechnology.


2020 ◽  
Vol 23 ◽  
pp. 35-40
Author(s):  
Kristaps Bebris ◽  
Inese Polaka

Advances in sequencing technology have led to an ever increasing amount of available short-read sequencing data. This has, consequently, exacerbated the need for efficient and precise classification tools that can be used in the analysis of these data. As it stands, recent years have shown that massive leaps in performance can be achieved when it comes to approaches that are based on heuristics, and apart from these improvements there has been an ever increasing interest in applying deep learning techniques to revolutionize this classification task. We attempt to study these approaches and to evaluate their performance in a reproducible fashion to get a better perspective on the current state of deep learning based methods when it comes to the classification of short-read sequencing data


Sign in / Sign up

Export Citation Format

Share Document