alignment tool
Recently Published Documents


TOTAL DOCUMENTS

152
(FIVE YEARS 40)

H-INDEX

15
(FIVE YEARS 3)

2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Maurilio Monsu ◽  
Matteo Comin

Abstract Sequencing technologies has provided the basis of most modern genome sequencing studies due to its high base-level accuracy and relatively low cost. One of the most demanding step is mapping reads to the human reference genome. The reliance on a single reference human genome could introduce substantial biases in downstream analyses. Pangenomic graph reference representations offer an attractive approach for storing genetic variations. Moreover, it is possible to include known variants in the reference in order to make read mapping, variant calling, and genotyping variant-aware. Only recently a framework for variation graphs, vg [Garrison E, Adam MN, Siren J, et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018;36:875–9], have improved variation-aware alignment and variant calling in general. The major bottleneck of vg is its high cost of reads mapping to a variation graph. In this paper we study the problem of SNP calling on a variation graph and we present a fast reads alignment tool, named VG SNP-Aware. VG SNP-Aware is able align reads exactly to a variation graph and detect SNPs based on these aligned reads. The results show that VG SNP-Aware can efficiently map reads to a variation graph with a speedup of 40× with respect to vg and similar accuracy on SNPs detection.


2021 ◽  
Author(s):  
Chiann-Ling Cindy Yeh ◽  
Clara J. Amorosi ◽  
Soyeon Showman ◽  
Maitreya J. Dunham

Motivation: Use of PacBio sequencing for characterizing barcoded libraries of genetic variants is on the rise. PacBio sequencing is useful in linking variant alleles in a library with their associated barcode tag. However, current approaches in resolving PacBio sequencing artifacts can result in a high number of incorrectly identified or unusable reads. Results: We developed a PacBio Read Alignment Tool (PacRAT) that improves the accuracy of barcode-variant mapping through several steps of read alignment and consensus calling. To quantify the performance of our approach, we simulated PacBio reads from eight variant libraries of various lengths and showed that PacRAT improves the accuracy in pairing barcodes and variants across these libraries. Analysis of real (non-simulated) libraries also showed an increase in the number of reads that can be used for downstream analyses when using PacRAT. Availability and Implementation: PacRAT is written in Python and is freely available on Github (https://github.com/dunhamlab/PacRAT).


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Massimo Maiolo ◽  
Lorenzo Gatti ◽  
Diego Frei ◽  
Tiziano Leidi ◽  
Manuel Gil ◽  
...  

Abstract Background Current alignment tools typically lack an explicit model of indel evolution, leading to artificially short inferred alignments (i.e., over-alignment) due to inconsistencies between the indel history and the phylogeny relating the input sequences. Results We present a new progressive multiple sequence alignment tool ProPIP. The process of insertions and deletions is described using an explicit evolutionary model—the Poisson Indel Process or PIP. The method is based on dynamic programming and is implemented in a frequentist framework. The source code can be compiled on Linux, macOS and Microsoft Windows platforms. The algorithm is implemented in C++ as standalone program. The source code is freely available on GitHub at https://github.com/acg-team/ProPIP and is distributed under the terms of the GNU GPL v3 license. Conclusions The use of an explicit indel evolution model allows to avoid over-alignment, to infer gaps in a phylogenetically consistent way and to make inferences about the rates of insertions and deletions. Instead of the arbitrary gap penalties, the parameters used by ProPIP are the insertion and deletion rates, which have biological interpretation and are contextualized in a probabilistic environment. As a result, indel rate settings may be optimised in order to infer phylogenetically meaningful gap patterns.


Plant Disease ◽  
2021 ◽  
Author(s):  
Samira CHEKALI ◽  
Ali OUJI ◽  
Stefania SOMMA ◽  
Mario Masiello ◽  
Wala DOUIHECH ◽  
...  

Lentil (Lens culinaris Medik.) is widely grown in arid and semi-arid regions of Tunisia. Low yields are often attributed to it being grown on marginal growing land. Since 2016, symptoms of wilt including yellowing and discoloration of the stem and root tissues were observed in lentils in several region of Tunisia. The annual mean incidence of infected plants ranged from 10% to 15%. In 2019-2020 growing seasons, symptomatic adult plants were randomly sampled from two fields located in south Tunisia (33°37’N; 11°4’E; N and 33°33’N; 11°2’E), and one field located in north west Tunisia (36°7’N; 8°43’E). Pieces were cut from roots and stem, surface sterilized, then plated on ¼ strength Potato Dextrose Agar (PDA) + 100 mg L-1 streptomycin sulfate (Burgess et al., 1994). Cultures were incubated for 5-6 days. Nine colonies with floccose mycelia, spare or abundant and white to violet color, morphologically similar to Fusarium redolens according to Leslie & Summerell (2006) were isolated from both roots and stems. They were single-spored (Burgess et al., 1994). Microconidia were formed in false heads on short monophialides. They were oval to elliptical or reniform and were 0-1 septate. Three-septate macroconidia with short apical cells were also observed. The strains were also deposited in the microbial ITEM Collection of Institute of Sciences of Food Production. Extraction of genomic DNA of Fusarium sp. strains was carried out according to Wizard® Magnetic DNA Purification System (Promega, Fitchburg, WI, USA). Molecular identification was carried out based on translation elongation factor (TEF) gene sequencing, as described in Fallahi et al. (2019). The TEF sequences were searched on GenBank database by using the Basic Local Alignment Tool (BLAST). The sequences of four strains (Z2P6, Z2P7, Z3P2 and Z3P5) on a total of nine, recovered from lentil roots from the two fields of south Tunisia showed 100% homology with TEF sequences of the epitype culture of F. redolens NRRL25600 (accession number MT409453) (Balmas et al. 2010; Gargouri et al. 2020). Sequences of the strains were submitted to GenBank with accession numbers MW393853, MW393854, MW393856 and MW393857. Pathogenicity of the four strains of F. redolens was evaluated on Kef lentil variety (Kharrat et al. 2007). Inoculum was produced on sterilized oat colonized with each strain. The colonized grains were air-dried on filter paper, ground in a laboratory mill, mixed at 10 % to soil (10 g of each isolate inoculum for 100 g of disinfected soil substrate) and potted. Three germinated lentil seeds were placed in each pot and irrigated periodically. The test was replicated four times. After 21 days, 60% (33-100%) of the plants inoculated with the four F. redolens strains showed symptoms of wilting, yellowing and rotting of roots and 17% died when inoculated by Z2P6 and Z2P7 strains. Non-inoculated plants showed no symptoms. F. redolens was isolated from 100% of the inoculated plants roots. This is the first report of F. redolens as a pathogen on lentil in Tunisia. This species has also been associated with lentil wilting in other regions of the world including Italy (Riccioni et al. 2008), Canada (Taheri et al. 2011) and Pakistan (Rafique et al. 2020). F. redolens was previously reported from wilted chickpea crops in Tunisia (Bouhadida et al. 2017). These findings are important for the Tunisian national legumes program and call for larger surveys to better understand the biology and ecology of this species and to prevent from disease spreading.


2021 ◽  
Vol 11 (19) ◽  
pp. 9246
Author(s):  
Gülce Çakmak ◽  
Alfonso Rodriguez Cuellar ◽  
Mustafa Borga Donmez ◽  
Martin Schimmel ◽  
Samir Abou-Ayash ◽  
...  

The information in the literature on the effect of printing layer thickness on interim 3D-printed crowns is limited. In the present study, the effect of layer thickness on the trueness and margin quality of 3D-printed composite resin crowns was investigated and compared with milled crowns. The crowns were printed in 3 different layer thicknesses (20, 50, and 100 μm) by using a hybrid resin based on acrylic esters with inorganic microfillers or milled from polymethylmethacrylate (PMMA) discs and digitized with an intraoral scanner (test scans). The compare tool of the 3D analysis software was used to superimpose the test scans and the computer-aided design file by using the manual alignment tool and to virtually separate the surfaces. Deviations at different surfaces on crowns were calculated by using root mean square (RMS). Margin quality of crowns was examined under a stereomicroscope and graded. The data were evaluated with one-way ANOVA and Tukey HSD tests. The layer thickness affected the trueness and margin quality of 3D-printed interim crowns. Milled crowns had higher trueness on intaglio and intaglio occlusal surfaces than 100 μm-layer thickness crowns. Milled crowns had the highest margin quality, while 20 μm and 100 μm layer thickness printed crowns had the lowest. The quality varied depending on the location of the margin.


2021 ◽  
Vol 21 (9) ◽  
pp. S62
Author(s):  
Jonathan Elysee ◽  
Renaud Lafage ◽  
Justin S. Smith ◽  
Eric O. Klineberg ◽  
Peter G. Passias ◽  
...  

2021 ◽  
Vol 22 (S9) ◽  
Author(s):  
Wei Quan ◽  
Bo Liu ◽  
Yadong Wang

Abstract Background DNA sequence alignment is a common first step in most applications of high-throughput sequencing technologies. The accuracy of sequence alignments directly affects the accuracy of downstream analyses, such as variant calling and quantitative analysis of transcriptome; therefore, rapidly and accurately mapping reads to a reference genome is a significant topic in bioinformatics. Conventional DNA read aligners map reads to a linear reference genome (such as the GRCh38 primary assembly). However, such a linear reference genome represents the genome of only one or a few individuals and thus lacks information on variations in the population. This limitation can introduce bias and impact the sensitivity and accuracy of mapping. Recently, a number of aligners have begun to map reads to populations of genomes, which can be represented by a reference genome and a large number of genetic variants. However, compared to linear reference aligners, an aligner that can store and index all genetic variants has a high cost in memory (RAM) space and leads to extremely long run time. Aligning reads to a graph-model-based index that includes all types of variants is ultimately an NP-hard problem in theory. By contrast, considering only single nucleotide polymorphism (SNP) information will reduce the complexity of the index and improve the speed of sequence alignment. Results The SNP-aware alignment tool (SALT) is a fast, memory-efficient, and SNP-aware short read alignment tool. SALT uses 5.8 GB of RAM to index a human reference genome (GRCh38) and incorporates 12.8M UCSC common SNPs. Compared with a state-of-the-art aligner, SALT has a similar speed but higher accuracy. Conclusions Herein, we present an SNP-aware alignment tool (SALT) that aligns reads to a reference genome that incorporates an SNP database. We benchmarked SALT using simulated and real datasets. The results demonstrate that SALT can efficiently map reads to the reference genome with significantly improved accuracy. Incorporating SNP information can improve the accuracy of read alignment and can reveal novel variants. The source code is freely available at https://github.com/weiquan/SALT.


Author(s):  
Dolores Lemmenmeier-Batinić

This paper describes the procedure of building a TEI-XML corpus of spoken Serbian starting from raw transcripts. The corpus consists of semi–structured interviews, which were gathered with the aim of investigating forms of address in Serbian. The interviews were thoroughly transcribed according to GAT transcribing conventions. However, the transcription was carried out without tools that would control the validity of the GAT syntax, or align the transcript with the audio records. In order to offer this resource to a broader audience, we resolved the inconsistencies in the original transcripts, normalised the semi-orthographic transcriptions and converted the corpus into a TEI-format for transcriptions of speech. Further, we enriched the corpus by tagging and lemmatising the data. Lastly, we aligned the corpus turns to the corresponding audio segments by using a force-alignment tool. In addition to presenting the main steps involved in converting the corpus to the XML-format, this paper also discusses current challenges in the processing of spoken data, and the implications of data re-use regarding transcriptions of speech. This corpus can be used for studying Serbian from the perspective of interactional linguistics, for investigating morphosyntax, grammar, lexicon and phonetics of spoken Serbian, for studying disfluencies, as well as for testing models for automatic speech recognition and forced alignment. The corpus is freely available for research purposes.


2021 ◽  
pp. 1-2
Author(s):  
Ma. del Rocío García-Olvera

Rhinoviruses (RVs) are increasingly implicated not only in mild upper respiratory tract infections, but also in more severe lower respiratory tract infections; however, little is known about species diversity and viral epidemiology of RVs among the infected children. Therefore, we investigated the rhinovirus (RV) infection prevalence over a 2-year period, compared it with prevalence patterns of other common respiratory viruses, and explored clinical and molecular epidemiology of RV infections among 590 children hospitalized with acute respiratory infection in north-western and central parts of Croatia. For respiratory virus detection, nasopharyngeal and pharyngeal flocked swabs were taken from each patient and subsequently analyzed with multiplex RT-PCR. To determine the RV species in a subset of positive children, 50UTR in RV-positive samples has been sequenced. Nucleotide sequences of referent RV strains were retrieved by searching the database with Basic Local Alignment Tool and used to construct alignments and phylogenetic trees using MAFFT multiple sequence alignment tool and the maximum likelihood method, respectively. In our study population RV was the most frequently detected virus, diagnosed in 197 patients (33.4%), of which 60.4% was detected as a monoinfection. Median age of RV-infected children was 2.25 years, and more than half of children infected with RV (55.8%) presented with lower respiratory tract infections. Most RV cases were detected from September to December, and all three species co-circulated during the analyzed period (2017–2019). Sequence analysis based on 50UTR region yielded 69 distinct strains; the most prevalent was RV-C (47.4%) followed by RV-A (44.7%) and RV-B (7.9%). Most of RV-A sequences formed a distinct phylogenetic group; only strains RI/HR409–18 (along with a reference strain MF978777) clustered with RV-C strains. Strains belonging to the group C were the most diverse (41.6% identity among strains), while group B was the most conserved (71.5% identity among strains). Despite such differences in strain groups (hitherto undescribed in Croatia), clinical presentation of infected children was rather similar. Our results are consistent with newer studies that investigated the etiology of acute respiratory infections, especially those focused on children with lower respiratory tract infections, where RVs should always be considered as potentially serious pathogens.


PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0239881
Author(s):  
René Staritzbichler ◽  
Edoardo Sarti ◽  
Emily Yaklich ◽  
Antoniya Aleksandrova ◽  
Marcus Stamm ◽  
...  

The alignment of primary sequences is a fundamental step in the analysis of protein structure, function, and evolution, and in the generation of homology-based models. Integral membrane proteins pose a significant challenge for such sequence alignment approaches, because their evolutionary relationships can be very remote, and because a high content of hydrophobic amino acids reduces their complexity. Frequently, biochemical or biophysical data is available that informs the optimum alignment, for example, indicating specific positions that share common functional or structural roles. Currently, if those positions are not correctly matched by a standard pairwise sequence alignment procedure, the incorporation of such information into the alignment is typically addressed in an ad hoc manner, with manual adjustments. However, such modifications are problematic because they reduce the robustness and reproducibility of the aligned regions either side of the newly matched positions. Previous studies have introduced restraints as a means to impose the matching of positions during sequence alignments, originally in the context of genome assembly. Here we introduce position restraints, or “anchors” as a feature in our alignment tool AlignMe, providing an aid to pairwise global sequence alignment of alpha-helical membrane proteins. Applying this approach to realistic scenarios involving distantly-related and low complexity sequences, we illustrate how the addition of anchors can be used to modify alignments, while still maintaining the reproducibility and rigor of the rest of the alignment. Anchored alignments can be generated using the online version of AlignMe available at www.bioinfo.mpg.de/AlignMe/.


Sign in / Sign up

Export Citation Format

Share Document