sequencing error rate Latest Research Papers

Abstract Current European regulations for autochthonous livestock breeds put a special emphasis on pedigree completeness, which requires laboratory paternity testing by genetic markers in most cases. This entails significant economic expenditure for breed societies and precludes other investments in breeding programs, such as genomic evaluation. Within this context, we developed paternity testing through low-coverage whole-genome data in order to reuse these data for genomic evaluation at no cost. Simulations relied on diploid genomes composed by 30 chromosomes (100 cM each) with 3,000,000 SNP per chromosome. Each population evolved during 1,000 non-overlapping generations with effective size 100, mutation rate 10–4, and recombination by Kosambi’s function. Only those populations with 1,000,000 ± 10% polymorphic SNP per chromosome in generation 1,000 were retained for further analyses, and expanded to the required number of parents and offspring. Individuals were sequenced at 0.01, 0.05, 0.1, 0.5 and 1X depth, with 100, 500, 1,000 or 10,000 base-pair reads and by assuming a random sequencing error rate per SNP between 10–2 and 10–5. Assuming known allele frequencies in the population and sequencing error rate, 0.05X depth sufficed to corroborate the true father (85,0%) and to discard other candidates (96,3%). Those percentages increased up to 99,6% and 99,9% with 0,1X depth, respectively (read length = 10,000 bp; smaller read lengths slightly improved the results because they increase the number of sequenced SNP). Results were highly sensitive to biases in allele frequencies and robust to inaccuracies regarding sequencing error rate. Low-coverage whole-genome sequencing data could be subsequently integrated into genomic BLUP equations by appropriately constructing the genomic relationship matrix. This approach increased the correlation between simulated and predicted breeding values by 1.21% (h2 = 0.25; 100 parents and 900 offspring; 0.1X depth by 10,000 bp reads). Although small, this increase opens the door to genomic evaluation in local livestock breeds.

Download Full-text

DEEP-LONG: A Fast and Accurate Aligner for Long RNA-Seq

10.21203/rs.3.rs-79489/v1 ◽

2020 ◽

Author(s):

Li Hou ◽

Yadong Wang

Keyword(s):

Alternative Splicing ◽

Error Rate ◽

Sequencing Error ◽

Rna Seq ◽

Sequencing Technology ◽

Short Reads ◽

Sequencing Errors ◽

Sequencing Error Rate ◽

Long Reads ◽

Gene Structures

Abstract BackgroundIn recent years, because of the development of sequencing technology, long reads were widely used in many studies, include transcriptomics studies. Obviously, Long reads have more advantages than short reads. And long reads align also different from short reads align. Until now Lots of tools can process long RNA-Seq, but there still have some problems need to solve. ResultsWe developed Deep-Long to process long RNA-Seq, Deep-Long is a fast and accurate tool. Deep-Long can handle troubles come from complicated gene structures and sequencing errors well, Deep-Long does well especially on alternative splicing and small exons. When sequencing error rate is low, Deep-Long can rapidly get more accurate results. While sequencing error rate rising, Deep-Long will use more time, but still more fast and accurate than most other tools.ConclusionsDeep-Long is an useful tool to align long RNA-Seq to genome, and Deep-Long can find more exons and splices.

Download Full-text

Needlestack: an ultra-sensitive variant caller for multi-sample next generation sequencing data

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa021 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 1

Author(s):

Tiffany M Delhomme ◽

Patrice H Avogbe ◽

Aurélie A G Gabriel ◽

Nicolas Alcala ◽

Noemie Leblay ◽

...

Keyword(s):

Next Generation Sequencing ◽

Error Rate ◽

Somatic Mutations ◽

Next Generation Sequencing Data ◽

Sequencing Error ◽

Next Generation ◽

Sequencing Error Rate ◽

Main Challenge ◽

A Genome ◽

Generation Sequencing

Abstract The emergence of next-generation sequencing (NGS) has revolutionized the way of reaching a genome sequence, with the promise of potentially providing a comprehensive characterization of DNA variations. Nevertheless, detecting somatic mutations is still a difficult problem, in particular when trying to identify low abundance mutations, such as subclonal mutations, tumour-derived alterations in body fluids or somatic mutations from histological normal tissue. The main challenge is to precisely distinguish between sequencing artefacts and true mutations, particularly when the latter are so rare they reach similar abundance levels as artefacts. Here, we present needlestack, a highly sensitive variant caller, which directly learns from the data the level of systematic sequencing errors to accurately call mutations. Needlestack is based on the idea that the sequencing error rate can be dynamically estimated from analysing multiple samples together. We show that the sequencing error rate varies across alterations, illustrating the need to precisely estimate it. We evaluate the performance of needlestack for various types of variations, and we show that needlestack is robust among positions and outperforms existing state-of-the-art method for low abundance mutations. Needlestack, along with its source code is freely available on the GitHub platform: https://github.com/IARCbioinfo/needlestack.

Download Full-text

The Impact of DNA Polymerase and Number of Rounds of Amplification in PCR on 16S rRNA Gene Sequence Data

mSphere ◽

10.1128/msphere.00163-19 ◽

2019 ◽

Vol 4 (3) ◽

Cited By ~ 16

Author(s):

Marc A. Sze ◽

Patrick D. Schloss

Keyword(s):

16S Rrna ◽

Microbial Communities ◽

Error Rate ◽

Sequence Data ◽

Pcr Amplification ◽

Sequencing Error ◽

Mock Community ◽

Sequencing Error Rate ◽

Stool Samples ◽

Human Stool

ABSTRACTPCR amplification of 16S rRNA genes is a critical yet underappreciated step in the generation of sequence data to describe the taxonomic composition of microbial communities. Numerous factors in the design of PCR can impact the sequencing error rate, the abundance of chimeric sequences, and the degree to which the fragments in the product represent their abundance in the original sample (i.e., bias). We compared the performance of high fidelity polymerases and various numbers of rounds of amplification when amplifying a mock community and human stool samples. Although it was impossible to derive specific recommendations, we did observe general trends. Namely, using a polymerase with the highest possible fidelity and minimizing the number of rounds of PCR reduced the sequencing error rate, fraction of chimeric sequences, and bias. Evidence of bias at the sequence level was subtle and could not be ascribed to the fragments’ fraction of bases that were guanines or cytosines. When analyzing mock community data, the amount that the community deviated from the expected composition increased with the number of rounds of PCR. This bias was inconsistent for human stool samples. Overall, the results underscore the difficulty of comparing sequence data that are generated by different PCR protocols. However, the results indicate that the variation in human stool samples is generally larger than that introduced by the choice of polymerase or number of rounds of PCR.IMPORTANCEA steep decline in sequencing costs drove an explosion in studies characterizing microbial communities from diverse environments. Although a significant amount of effort has gone into understanding the error profiles of DNA sequencers, little has been done to understand the downstream effects of the PCR amplification protocol. We quantified the effects of the choice of polymerase and number of PCR cycles on the quality of downstream data. We found that these choices can have a profound impact on the way that a microbial community is represented in the sequence data. The effects are relatively small compared to the variation in human stool samples; however, care should be taken to use polymerases with the highest possible fidelity and to minimize the number of rounds of PCR. These results also underscore that it is not possible to directly compare sequence data generated under different PCR conditions.

Download Full-text

Needlestack: an ultra-sensitive variant caller for multi-sample next generation sequencing data

10.1101/639377 ◽

2019 ◽

Cited By ~ 2

Author(s):

Tiffany M. Delhomme ◽

Patrice H. Avogbe ◽

Aurélie Gabriel ◽

Nicolas Alcala ◽

Noemie Leblay ◽

...

Keyword(s):

Next Generation Sequencing ◽

Error Rate ◽

Somatic Mutations ◽

Next Generation Sequencing Data ◽

Sequencing Error ◽

Next Generation ◽

Sequencing Error Rate ◽

Main Challenge ◽

A Genome ◽

Generation Sequencing

ABSTRACTThe emergence of Next-Generation Sequencing (NGS) has revolutionized the way of reaching a genome sequence, with the promise of potentially providing a comprehensive characterization of DNA variations. Nevertheless, detecting somatic mutations is still a difficult problem, in particular when trying to identify low abundance mutations such as subclonal mutations, tumour-derived alterations in body fluids or somatic mutations from histological normal tissue. The main challenge is to precisely distinguish between sequencing artefacts and true mutations, particularly when the latter are so rare they reach similar abundance levels as artefacts. Here, we present needlestack, a highly sensitive variant caller, which directly learns from the data the level of systematic sequencing errors to accurately call mutations. Needlestack is based on the idea that the sequencing error rate can be dynamically estimated from analyzing multiple samples together. We show that the sequencing error rate varies across alterations, illustrating the need to precisely estimate it. We evaluate the performance of needlestack for various types of variations, and we show that needlestack is robust among positions and outperforms existing state-of-the-art method for low abundance mutations. Needlestack, along with its source code is freely available on the GitHub plateform: https://github.com/IARCbioinfo/needlestack.

Download Full-text

The impact of DNA polymerase and number of rounds of amplification in PCR on 16S rRNA gene sequence data

10.1101/565598 ◽

2019 ◽

Cited By ~ 2

Author(s):

Marc A Sze ◽

Patrick D Schloss

Keyword(s):

16S Rrna ◽

Microbial Communities ◽

Error Rate ◽

Sequence Data ◽

Pcr Amplification ◽

Sequencing Error ◽

Mock Community ◽

Sequencing Error Rate ◽

Stool Samples ◽

Human Stool

AbstractPCR amplification of 16S rRNA genes is a critical, yet under appreciated step in the generation of sequence data to describe the taxonomic composition of microbial communities. Numerous factors in the design of PCR can impact the sequencing error rate, the abundance of chimeric sequences, and the degree to which the fragments in the product represent their abundance in the original sample (i.e. bias). We compared the performance of high fidelity polymerases and varying number of rounds of amplification when amplifying a mock community and human stool samples. Although it was impossible to derive specific recommendations, we did observe general trends. Namely, using a polymerase with the highest possible fidelity and minimizing the number of rounds of PCR reduced the sequencing error rate, fraction of chimeric sequences, and bias. Evidence of bias at the sequence level was subtle and could not be ascribed to the fragments’ fraction of bases that were guanines or cytosines. When analyzing mock community data, the amount that the community deviated from the expected composition increased with rounds of PCR. This bias was inconsistent for human stool samples. Overall the results underscore the difficulty of comparing sequence data that are generated by different PCR protocols. However, the results indicate that the variation in human stool samples is generally larger than that introduced by the choice of polymerase or number of rounds of PCR.ImportanceA steep decline in sequencing costs drove an explosion in studies characterizing microbial communities from diverse environments. Although a significant amount of effort has gone into understanding the error profiles of DNA sequencers, little has been done to understand the downstream effects of the PCR amplification protocol. We quantified the effects of the choice of polymerase and number of PCR cycles on the quality of downstream data. We found that these choices can have a profound impact on the way that a microbial community is represented in the sequence data. The effects are relatively small compared to the variation in human stool samples, however, care should be taken to use polymerases with the highest possible fidelity and to minimize the number of rounds of PCR. These results also underscore that it is not possible to directly compare sequence data generated under different PCR conditions.

Download Full-text

MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications

BMC Bioinformatics ◽

10.1186/s12859-018-2223-1 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 2

Author(s):

Mohammad Hadigol ◽

Hossein Khiabanian

Keyword(s):

Error Rate ◽

Sequencing Error ◽

Genomic Context ◽

Sequencing Error Rate ◽

The Impact

Download Full-text

Analysis of 454 sequencing error rate, error sources, and artifact recombination for detection of Low-frequency drug resistance mutations in HIV-1 DNA

Retrovirology ◽

10.1186/1742-4690-10-18 ◽

2013 ◽

Vol 10 (1) ◽

Cited By ~ 79

Author(s):

Wei Shao ◽

Valerie F Boltz ◽

Jonathan E Spindler ◽

Mary F Kearney ◽

Frank Maldarelli ◽

...

Keyword(s):

Drug Resistance ◽

Error Rate ◽

Low Frequency ◽

454 Sequencing ◽

Sequencing Error ◽

Resistance Mutations ◽

Error Sources ◽

Drug Resistance Mutations ◽

Sequencing Error Rate ◽

Hiv 1

Download Full-text

Inferring Population Mutation Rate and Sequencing Error Rate Using the SNP Frequency Spectrum in a Sample of DNA Sequences

Molecular Biology and Evolution ◽

10.1093/molbev/msp059 ◽

2009 ◽

Vol 26 (7) ◽

pp. 1479-1490 ◽

Cited By ~ 8

Author(s):

X. Liu ◽

T. J. Maxwell ◽

E. Boerwinkle ◽

Y.-X. Fu

Keyword(s):

Frequency Spectrum ◽

Mutation Rate ◽

Error Rate ◽

Dna Sequences ◽

Sequencing Error ◽

Sequencing Error Rate

Download Full-text

sequencing error rate
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

353 ASAS-EAAP Talk: Low-coverage whole-genome sequencing in local livestock breeds

DEEP-LONG: A Fast and Accurate Aligner for Long RNA-Seq

Needlestack: an ultra-sensitive variant caller for multi-sample next generation sequencing data

The Impact of DNA Polymerase and Number of Rounds of Amplification in PCR on 16S rRNA Gene Sequence Data

Needlestack: an ultra-sensitive variant caller for multi-sample next generation sequencing data

The impact of DNA polymerase and number of rounds of amplification in PCR on 16S rRNA gene sequence data

MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications

Analysis of 454 sequencing error rate, error sources, and artifact recombination for detection of Low-frequency drug resistance mutations in HIV-1 DNA

Inferring Population Mutation Rate and Sequencing Error Rate Using the SNP Frequency Spectrum in a Sample of DNA Sequences

Export Citation Format

sequencing error rateRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

353 ASAS-EAAP Talk: Low-coverage whole-genome sequencing in local livestock breeds

DEEP-LONG: A Fast and Accurate Aligner for Long RNA-Seq

Needlestack: an ultra-sensitive variant caller for multi-sample next generation sequencing data

The Impact of DNA Polymerase and Number of Rounds of Amplification in PCR on 16S rRNA Gene Sequence Data

Needlestack: an ultra-sensitive variant caller for multi-sample next generation sequencing data

The impact of DNA polymerase and number of rounds of amplification in PCR on 16S rRNA gene sequence data

MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications

Analysis of 454 sequencing error rate, error sources, and artifact recombination for detection of Low-frequency drug resistance mutations in HIV-1 DNA

Inferring Population Mutation Rate and Sequencing Error Rate Using the SNP Frequency Spectrum in a Sample of DNA Sequences

sequencing error rate
Recently Published Documents