scholarly journals Breed-specific Reference Sequence Optimizes Mapping Accuracy of NGS Analyses for Pigs

2020 ◽  
Author(s):  
Dan Wang ◽  
Liu Yang ◽  
Chao Ning ◽  
Jian-Feng Liu ◽  
Xingbo Zhao

Abstract Background: The reference sequence plays a key role in next-generation sequencing (NGS), which impacts the mapping quality during genome analyses. Especially for mitochondrion which contains plentiful amounts of innate DNA, the optimal reference sequence makes mitochondrial genome (mitogenome) alignment accurate and efficient. In this study, different mapping reference sequences, the commonly used reference sequence (CU-ref), the breed-specific reference sequence (BS-ref) and the sample-specific reference sequence (SS-ref) were compared to test the accuracy of mapping quality in the NGS analyses of pigs.Results: Four pigs from three breeds were high-throughput sequenced and subsequently mapped using three different reference mentioned above, which indicated that the BS-ref produced the largest number of mappable reads and coverages at acceptable run-times. After that, the SNP calling accuracy was evaluated by 18 detection strategies with three tools SAMtools, VarScan and GATK with different parameters under the BS-ref mapping strategy. Results showed that nine detection strategies achieved the same best specificity and sensitivity, which suggested a high accuracy of mitogenome alignment by the BS-ref alignment strategy, with a low requirement for SNP calling tools and parameter choices. Conclusions: Overall, using the breed-specific reference sequences in NGS analyses optimized the mapping quality and the SNP calling accuracy. This study indicates that the different reference sequences which represent different genetic distances between reference sequences and samples in mitogenome alignments influence alignment quality.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dan Wang ◽  
Liu Yang ◽  
Chao Ning ◽  
Jian-Feng Liu ◽  
Xingbo Zhao

Abstract Background Reference sequences play a vital role in next-generation sequencing (NGS), impacting mapping quality during genome analyses. However, reference genomes usually do not represent the full range of genetic diversity of a species as a result of geographical divergence and independent demographic events of different populations. For the mitochondrial genome (mitogenome), which occurs in high copy numbers in cells and is strictly maternally inherited, an optimal reference sequence has the potential to make mitogenome alignment both more accurate and more efficient. In this study, we used three different types of reference sequences for mitogenome mapping, i.e., the commonly used reference sequence (CU-ref), the breed-specific reference sequence (BS-ref) and the sample-specific reference sequence (SS-ref), respectively, and compared the accuracy of mitogenome alignment and SNP calling among them, for the purpose of proposing the optimal reference sequence for mitochondrial DNA (mtDNA) analyses of specific populations Results Four pigs, representing three different breeds, were high-throughput sequenced, subsequently mapping reads to the reference sequences mentioned above, resulting in a largest mapping ratio and a deepest coverage without increased running time when aligning reads to a BS-ref. Next, single nucleotide polymorphism (SNP) calling was carried out by 18 detection strategies with the three tools SAMtools, VarScan and GATK with different parameters, using the bam results mapping to BS-ref. The results showed that all eighteen strategies achieved the same high specificity and sensitivity, which suggested a high accuracy of mitogenome alignment by the BS-ref because of a low requirement for SNP calling tools and parameter choices. Conclusions This study showed that different reference sequences representing different genetic relationships to sample reads influenced mitogenome alignment, with the breed-specific reference sequences being optimal for mitogenome analyses, which provides a refined processing perspective for NGS data.


Author(s):  
Fabian Sievers ◽  
Desmond G Higgins

Abstract Motivation Secondary structure prediction accuracy (SSPA) in the QuanTest benchmark can be used to measure accuracy of a multiple sequence alignment. SSPA correlates well with the sum-of-pairs score, if the results are averaged over many alignments but not on an alignment-by-alignment basis. This is due to a sub-optimal selection of reference and non-reference sequences in QuanTest. Results We develop an improved strategy for selecting reference and non-reference sequences for a new benchmark, QuanTest2. In QuanTest2, SSPA and SP correlate better on an alignment-by-alignment basis than in QuanTest. Guide-trees for QuanTest2 are more balanced with respect to reference sequences than in QuanTest. QuanTest2 scores correlate well with other well-established benchmarks. Availability and implementation QuanTest2 is available at http://bioinf.ucd.ie/quantest2.tar, comprises of reference and non-reference sequence sets and a scoring script. Supplementary information Supplementary data are available at Bioinformatics online


2019 ◽  
Author(s):  
Jung-Ki Yoon ◽  
Taek Soo Kim ◽  
Jong-Il Kim ◽  
Jae-Joon Yim

Abstract Background : Nontuberculous mycobacterium (NTM) species are ubiquitous microorganisms. NTM pulmonary disease (NTM-PD) is caused not by human-to-human transmission but by independent environmental acquisition. However, recent studies using next-generation sequencing (NGS) have reported trans-continental spread of Mycobacterium abscessus among patients with cystic fibrosis. Results : We investigated NTM genomes through NGS to examine transmission patterns in three pairs of co-habiting NTM-PD patients who were suspected of patient-to-patient transmission. Three pairs of patients with NTM-PD co-habiting for at least 15 years were enrolled: a mother and a daughter with M. avium PD, a couple with M. intracellulare PD, and a second couple, one of whom was infected with M. intracellulare PD and the other of whom was infected with M. abscessus subsp. massiliense PD. Whole genome sequencing was performed using NTM colonies isolated from patients and environmental specimens. Genetic distances were estimated based on single nucleotide polymorphisms (SNPs) in the NTM genomes. Comparing SNPs in the consensus regions, the minimum pairwise SNP distances of NTM isolates derived from the two pairs of patients infected with the same NTM species were over 10,000. In phylogenetic analysis, the NTM isolates from patients with M. avium PD clustered with isolates from different environmental sources. Conclusions : In conclusion, considering the genetic distances between NTM strains, the likelihood of patient-to-patient transmission in pairs of co-habiting NTM-PD patients without overt immune deficiency is minimal.


2021 ◽  
Author(s):  
Lanqing Lv ◽  
Xinyang Wu ◽  
Jiajia Weng ◽  
Yuchao Lai ◽  
Kelei Han ◽  
...  

Abstract The complete genomic sequence of a novel ilarvirus from Eleocharis dulcis, tentatively named water chestnut virus A (WCVA), was determined using next generation sequencing (NGS) combined with reverse transcription polymerase chain reaction (RT-PCR) and rapid amplification of cDNA ends (RACE) PCR. The three genomic RNA components of WCVA were 3578 (RNA1), 2873 (RNA2) and 2073 (RNA3) nucleotides long, with four predicted open reading frames containing conserved domains and motifs typical of ilarviruses. Phylogenetic analyses of each predicted protein consistently placed WCVA in subgroup 4 of the genus Ilarvirus, together with prune dwarf virus, viola white distortion associated virus, fragaria chiloensis latent virus and potato yellowing virus. The genetic distances and lack of serological reaction to antisera of other ilarviruses suggest that WCVA is a novel member of the genus.


Parasite ◽  
2019 ◽  
Vol 26 ◽  
pp. 10 ◽  
Author(s):  
Jérôme Depaquit ◽  
Mohammad Akhoundi ◽  
Djamel Haouchine ◽  
Stéphane Mantelet ◽  
Arezki Izri

Schistosomiasis is one of the most significant parasitic diseases of humans. The hybridization of closely related Schistosoma species has already been documented. However, hybridization between phylogenetically distant species is unusual. In the present study, we characterized the causative agent of schistosomiasis in a 14-year-old patient with hematuria from Côte d’Ivoire, using morphological and molecular approaches. A 24-hour parasitological examination of urine showed the presence of numerous eggs (150 μm long × 62 μm wide) with a lateral spine (25 μm), identified morphologically as Schistosoma mansoni. Examination of stools performed on the same day found no parasites. The urine and stool examinations of the patient’s family members performed two weeks later showed neither parasites nor hematuria; but in contrast, many S. mansoni eggs were found again in the patient’s urine, but never in his stools. Conventional PCRs were performed, using two primer pairs targeting 28S-rDNA and COI mtDNA. The 28S-rDNA sequence of these eggs, compared with two reference sequences from GenBank demonstrated a hybrid with 25 double peaks, indicating clearly hybrid positions (5.37%) between S. mansoni and S. haematobium. Similarly, we identified a unique S. mansoni COI sequence for the two eggs, with 99.1% homology with the S. mansoni reference sequence. Consequently, this case was the result of hybridization between an S. haematobium male and an S. mansoni female. This should be taken into consideration to explore the elimination of ectopic schistosome eggs in the future.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S6) ◽  
Author(s):  
Chi-Ming Leung ◽  
Dinghua Li ◽  
Yan Xin ◽  
Wai-Chun Law ◽  
Yifan Zhang ◽  
...  

Abstract Background Next-generation sequencing (NGS) enables unbiased detection of pathogens by mapping the sequencing reads of a patient sample to the known reference sequence of bacteria and viruses. However, for a new pathogen without a reference sequence of a close relative, or with a high load of mutations compared to its predecessors, read mapping fails due to a low similarity between the pathogen and reference sequence, which in turn leads to insensitive and inaccurate pathogen detection outcomes. Results We developed MegaPath, which runs fast and provides high sensitivity in detecting new pathogens. In MegaPath, we have implemented and tested a combination of polishing techniques to remove non-informative human reads and spurious alignments. MegaPath applies a global optimization to the read alignments and reassigns the reads incorrectly aligned to multiple species to a unique species. The reassignment not only significantly increased the number of reads aligned to distant pathogens, but also significantly reduced incorrect alignments. MegaPath implements an enhanced maximum-exact-match prefix seeding strategy and a SIMD-accelerated Smith-Waterman algorithm to run fast. Conclusions In our benchmarks, MegaPath demonstrated superior sensitivity by detecting eight times more reads from a low-similarity pathogen than other tools. Meanwhile, MegaPath ran much faster than the other state-of-the-art alignment-based pathogen detection tools (and compariable with the less sensitivity profile-based pathogen detection tools). The running time of MegaPath is about 20 min on a typical 1 Gb dataset.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Lina Bu ◽  
Qi Wang ◽  
Wenjin Gu ◽  
Ruifei Yang ◽  
Di Zhu ◽  
...  

Abstract There is generally one standard reference sequence for each species. When extensive variations exist in other breeds of the species, it can lead to ambiguous alignment and inaccurate variant calling and, in turn, compromise the accuracy of downstream analysis. Here, with the help of the FPGA hardware platform, we present a method that generates an alternative reference via an iterative strategy to improve the read alignment for breeds that are genetically distant to the reference breed. Compared to the published reference genomes, by using the alternative reference sequences we built, the mapping rates of Chinese indigenous pigs and chickens were improved by 0.61–1.68% and 0.09–0.45%, respectively. These sequences also enable researchers to recover highly variable regions that could be missed using public reference sequences. We also determined that the optimal number of iterations needed to generate alternative reference sequences were seven and five for pigs and chickens, respectively. Our results show that, for genetically distant breeds, generating an alternative reference sequence can facilitate read alignment and variant calling and improve the accuracy of downstream analyses.


2015 ◽  
Vol 2015 ◽  
pp. 1-22 ◽  
Author(s):  
Marie-Alice Fraiture ◽  
Philippe Herman ◽  
Isabel Taverniers ◽  
Marc De Loose ◽  
Dieter Deforce ◽  
...  

In many countries, genetically modified organisms (GMO) legislations have been established in order to guarantee the traceability of food/feed products on the market and to protect the consumer freedom of choice. Therefore, several GMO detection strategies, mainly based on DNA, have been developed to implement these legislations. Due to its numerous advantages, the quantitative PCR (qPCR) is the method of choice for the enforcement laboratories in GMO routine analysis. However, given the increasing number and diversity of GMO developed and put on the market around the world, some technical hurdles could be encountered with the qPCR technology, mainly owing to its inherent properties. To address these challenges, alternative GMO detection methods have been developed, allowing faster detections of single GM target (e.g., loop-mediated isothermal amplification), simultaneous detections of multiple GM targets (e.g., PCR capillary gel electrophoresis, microarray, and Luminex), more accurate quantification of GM targets (e.g., digital PCR), or characterization of partially known (e.g., DNA walking and Next Generation Sequencing (NGS)) or unknown (e.g., NGS) GMO. The benefits and drawbacks of these methods are discussed in this review.


2021 ◽  
Author(s):  
Luca Barbon ◽  
Victoria Offord ◽  
Elizabeth J. Radford ◽  
Adam P. Butler ◽  
Sebastian S. Gerety ◽  
...  

AbstractMotivationRecent advances in CRISPR/Cas9 technology allow for the functional analysis of genetic variants at single nucleotide resolution whilst maintaining genomic context (Findlay et al., 2018). This approach, known as saturation genome editing (SGE), is a distinct type of deep mutational scanning (DMS) that systematically alters each position in a target region to explore its function. SGE experiments require the design and synthesis of oligonucleotide variant libraries which are introduced into the genome by homology-directed repair (HDR). This technology is broadly applicable to diverse research fields such as disease variant identification, drug development, structure-function studies, synthetic biology, evolutionary genetics and the study of host-pathogen interactions. Here we present the Variant Library Annotation Tool (VaLiAnT) which can be used to generate saturation mutagenesis oligonucleotide libraries from user-defined genomic coordinates and standardised input files. This software package is intentionally versatile to accommodate diverse operability, with species, genomic reference sequences and transcriptomic annotations specified by the user. Genomic ranges, directionality and frame information are considered to allow perturbations at both the nucleotide and amino acid level.ResultsCoordinates for a genomic range, that may include exonic and/or intronic sequence, are provided by the user in order to retrieve a corresponding oligonucleotide reference sequence. A user-specified range within this sequence is then subject to systematic, nucleotide and/or amino acid saturating mutator functions, with each discrete mutation returned to the user as a separate sequence, building up the final oligo library. If desired, variant accessions from genetic information repositories, such as ClinVar and gnomAD, that fall within the user-specified ranges, will also be incorporated into the library.For SGE library generation, base reference sequences can be modified to include PAM (Protospacer Adjacent Motif) and protospacer ‘protection edits’ that prevent Cas9 from cutting incorporated oligonucleotide tracts. Mutator functions modify this protected reference sequence to generate variant sequences. Constant regions are designated for non-editing to allow specific adapter annealing for downstream cloning and amplification from the library pool.A metadata file is generated, delineating annotation information for each variant sequence to aid computational analysis. In addition, a library file is generated, which contains unique sequences (any exact duplicate sequences are removed) ready for submission to commercial synthesis platforms. A VCF file listing all variants is also generated for analysis and quality control processes.The VaLiAnT software package provides a novel means to systemically retrieve, mutate and annotate genomic sequences for oligonucleotide library generation. Specific features for SGE library generation can be employed, with other diverse applications possible.Availability and ImplementationVaLiAnT is a command line tool written in Python. Source code, testing data, example library input and output files, and executables are available at https://github.com/cancerit/VaLiAnT. A user manual details step by step instructions for software use, available at https://github.com/cancerit/VaLiAnT/wiki. The software is freely available for non-commercial use (see Licence for more details, https://github.com/cancerit/VaLiAnT/blob/develop/LICENSE).


Sign in / Sign up

Export Citation Format

Share Document