scholarly journals Reference Mapping Considering Swaps of Adjacent Bases

2021 ◽  
Vol 11 (11) ◽  
pp. 5038
Author(s):  
Youngho Kim ◽  
Munseong Kang ◽  
Ju-Hui Jeong ◽  
Dae Woong Kang ◽  
Soo Jun Park ◽  
...  

Since the time of the HGP, research into next-generation sequencing, which can reduce the cost and time of sequence analysis using computer algorithms, has been actively conducted. Mapping is a next-generation sequencing method that identifies sequences by aligning short reads with a reference genome for which sequence information is known. Mapping can be applied to tasks such as SNP calling, motif searches, and gene identification. Research on mapping that utilizes BWT and GPU has been undertaken in order to obtain faster mapping. In this paper, we propose a new mapping algorithm with additional consideration for base swaps. The experimental results demonstrate that when the penalty score for swaps was −1, −2, and −3 in paired-end alignment, for the human whole genome, SOAP3-swap aligned 4667, 2318, and 972 more read pairs, respectively, than SOAP3-dp, and for the drosophila genome, SOAP3-swap aligned 1253, 454, and 129 more read pairs, respectively, than SOAP3-dp. SOAP3-swap has the same functionality as that of SOAP3-dp and also improves the alignment ratio by taking biologically significant swaps into account for the first time.

2012 ◽  
Vol 14 (6) ◽  
pp. 602-612 ◽  
Author(s):  
Maurice Chan ◽  
Shen Mo Ji ◽  
Zhen Xuan Yeo ◽  
Linda Gan ◽  
Eric Yap ◽  
...  

2019 ◽  
Author(s):  
◽  
Morgan Gueuning

Wild bees are essential pollinators and therefore play a key role in both natural and agricultural ecosystems. However, bees have often been neglected in conservation studies and policies worldwide, which is surprising given their ecological importance. As a result, little is known on the conservation status of the vast majority of wild bee species in Europe, and even less worldwide. Limited surveys suggest important declines in the abundance and diversity of most wild bee communities worldwide. It is therefore urgent to implement targeted measures for the conservation of these keystone species. Once implemented, the effectiveness of these measures must be evaluated using adequate monitoring programs. To date, wild bee surveys are entirely based on morphological identification, which is both labor intensive and time consuming. Consequently, an affordable, high-throughput identification method is needed to reduce costs and improve bee monitoring. The objective of this thesis was to evaluate novel genetic techniques based on Next Generation Sequencing (NGS) methods for facilitating surveys of wild bees. NGS tools were mainly investigated for bridging two important impediments to wild bee conservation efforts, i.e., the cost of biodiversity assessment schemes and taxonomic incompleteness. With the development of NGS techniques, DNA barcoding has gained enormous momentum, enabling cost-effective, fast and accurate identifications. Before these methods can be routinely used in monitoring programs, there are however still important knowledge gaps to fill. These gaps mainly concern the detection of rare species and the acquisition of accurate quantitative data on species abundance; more generally the cost and labour effectiveness of these methods need to be evaluated. To provide a comprehensive presentation of the advantages and weaknesses of different NGS-based identification methods, we assessed three of the most promising ones, namely metabarcoding, mitogenomics and NGS barcoding. Using a regular monitoring data, we found that NGS barcoding performed best for both species’ presence/absence and abundance data, producing only few false positives and no false negatives. The other methods investigated were less reliable in term of species detection and inference of abundance data, and partly led to erroneous ecological conclusions. In terms of workload and cost, we showed that NGS techniques were more expensive than morphological identification with our dataset, although these techniques would become slightly more economical in large-scale monitoring programs. A second aim of this thesis was to provide an easy and robust genomic solution to alleviate taxonomical incompleteness, one of the major impediments to the effective conservation of many insect taxa. For conservation purposes, having stable and well-delimited species hypotheses is essential. Currently, most species are delimitated based on morphology and/or DNA barcoding. These methods are however associated with important limitations, and it is widely accepted that species delimitation should rely on multi-locus genomic markers. To overcome these limitations, ultraconserved elements (UCEs) were tested as a fast and robust approach using different species-complexes harbouring cryptic diversity, mitochondrial introgression, or mitochondrial paraphyly. Phylogenetic analyses of UCEs were highly conclusive and yielded meaningful species delimitation hypotheses in all cases. These results provide strong evidence for the potential of UCEs as a fast method for delimiting species even in cases of recently diverged lineages. Advantages and limitations of UCEs for shallow phylogenetic studies are further discussed.


Plant Disease ◽  
2019 ◽  
Vol 103 (6) ◽  
pp. 1075-1083 ◽  
Author(s):  
Gustavo A. Díaz-Cruz ◽  
Charlotte M. Smith ◽  
Kiana F. Wiebe ◽  
Sachi M. Villanueva ◽  
Adam R. Klonowski ◽  
...  

Soybean (Glycine max) has become an important crop in Manitoba, Canada, with a 10-fold increase in dedicated acreage over the past decade. Given the rapid increase in production, scarce information about foliar diseases present in the province has been recorded. In order to describe the foliar pathogens affecting this legume, we harnessed next-generation sequencing (NGS) to carry out a comprehensive survey across Manitoba in 2016. Fields were sampled during the V2/3 (33 fields) and R6 (70 fields) growth stages, with at least three symptomatic leaves per field collected and subjected to RNA sequencing. We successfully detected several bacteria, fungi, and viruses known to infect soybean, including Pseudomonas savastanoi pv. glycinea, Septoria glycines, and Peronospora manshurica, as well as pathogens not previously identified in the province (e.g., Pseudomonas syringae pv. tabaci, Cercospora sojina, and Bean yellow mosaic virus). For some microorganisms, we were able to disentangle the different pathovars present and/or assemble their genome sequence. Since NGS generates data on the entire flora and fauna occupying a leaf sample, we also identified residual pathogens (i.e., pathogens of crops other than soybean) and multiple species of arthropod pests. Finally, the sequence information produced by NGS allowed for the development of polymerase chain reaction-based diagnostics for some of the most widespread and important pathogens. Although there are many benefits of using NGS for large-scale plant pathogen diagnoses, we also discuss some of the limitations of this technology.


2020 ◽  
Vol 79 (2) ◽  
pp. 105-113
Author(s):  
Abdul Bari Muneera Parveen ◽  
Divya Lakshmanan ◽  
Modhumita Ghosh Dasgupta

The advent of next-generation sequencing has facilitated large-scale discovery and mapping of genomic variants for high-throughput genotyping. Several research groups working in tree species are presently employing next generation sequencing (NGS) platforms for marker discovery, since it is a cost effective and time saving strategy. However, most trees lack a chromosome level genome map and validation of variants for downstream application becomes obligatory. The cost associated with identifying potential variants from the enormous amount of sequence data is a major limitation. In the present study, high resolution melting (HRM) analysis was optimized for rapid validation of single nucleotide polymorphisms (SNPs), insertions or deletions (InDels) and simple sequence repeats (SSRs) predicted from exome sequencing of parents and hybrids of Eucalyptus tereticornis Sm. ? Eucalyptus grandis Hill ex Maiden generated from controlled hybridization. The cost per data point was less than 0.5 USD, providing great flexibility in terms of cost and sensitivity, when compared to other validation methods. The sensitivity of this technology in variant detection can be extended to other applications including Bar-HRM for species authentication and TILLING for detection of mutants.


2010 ◽  
Vol 76 (12) ◽  
pp. 3863-3868 ◽  
Author(s):  
J. Kirk Harris ◽  
Jason W. Sahl ◽  
Todd A. Castoe ◽  
Brandie D. Wagner ◽  
David D. Pollock ◽  
...  

ABSTRACT Constructing mixtures of tagged or bar-coded DNAs for sequencing is an important requirement for the efficient use of next-generation sequencers in applications where limited sequence data are required per sample. There are many applications in which next-generation sequencing can be used effectively to sequence large mixed samples; an example is the characterization of microbial communities where ≤1,000 sequences per samples are adequate to address research questions. Thus, it is possible to examine hundreds to thousands of samples per run on massively parallel next-generation sequencers. However, the cost savings for efficient utilization of sequence capacity is realized only if the production and management costs associated with construction of multiplex pools are also scalable. One critical step in multiplex pool construction is the normalization process, whereby equimolar amounts of each amplicon are mixed. Here we compare three approaches (spectroscopy, size-restricted spectroscopy, and quantitative binding) for normalization of large, multiplex amplicon pools for performance and efficiency. We found that the quantitative binding approach was superior and represents an efficient scalable process for construction of very large, multiplex pools with hundreds and perhaps thousands of individual amplicons included. We demonstrate the increased sequence diversity identified with higher throughput. Massively parallel sequencing can dramatically accelerate microbial ecology studies by allowing appropriate replication of sequence acquisition to account for temporal and spatial variations. Further, population studies to examine genetic variation, which require even lower levels of sequencing, should be possible where thousands of individual bar-coded amplicons are examined in parallel.


Sign in / Sign up

Export Citation Format

Share Document