read depth Latest Research Papers

AbstractThe STRC gene, located on chromosome 15q15.3, is one of the genetic causes of autosomal recessive mild-to-moderate sensorineural hearing loss. One of the unique characteristics of STRC-associated hearing loss is the high prevalence of long deletions or copy number variations observed on chromosome 15q15.3. Further, the deletion of chromosome 15q15.3 from STRC to CATSPER2 is also known to be a genetic cause of deafness infertility syndrome (DIS), which is associated with not only hearing loss but also male infertility, as CATSPER2 plays crucial roles in sperm motility. Thus, information regarding the deletion range for each patient is important to the provision of appropriate genetic counselling for hearing loss and male infertility. In the present study, we performed next-generation sequencing (NGS) analysis for 9956 Japanese hearing loss patients and analyzed copy number variations in the STRC gene based on NGS read depth data. In addition, we performed Multiplex Ligation-dependent Probe Amplification analysis to determine the deletion range including the PPIP5K1, CKMT1B, STRC and CATSPER2 genomic region to estimate the prevalence of the STRC-CATSPER deletion, which is causative for DIS among the STRC-associated hearing loss patients. As a result, we identified 276 cases with STRC-associated hearing loss. The prevalence of STRC-associated hearing loss in Japanese hearing loss patients was 2.77% (276/9956). In addition, 77.1% of cases with STRC homozygous deletions carried a two copy loss of the entire CKMT1B-STRC-CATSPER2 gene region. This information will be useful for the provision of more appropriate genetic counselling regarding hearing loss and male infertility for the patients with a STRC deletion.

Increasing calling accuracy, coverage, and read-depth in sequence data by the use of haplotype blocks

PLoS Genetics ◽

10.1371/journal.pgen.1009944 ◽

2021 ◽

Vol 17 (12) ◽

pp. e1009944

Author(s):

Torsten Pook ◽

Adnane Nemri ◽

Eric Gerardo Gonzalez Segovia ◽

Daniel Valle Torres ◽

Henner Simianer ◽

...

Keyword(s):

Data Quality ◽

Genomic Prediction ◽

Sequence Data ◽

Association Studies ◽

Genomic Data ◽

Read Depth ◽

Error Rates ◽

Whole Genome Sequence ◽

Genome Wide Association Studies ◽

Haplotype Blocks

High-throughput genotyping of large numbers of lines remains a key challenge in plant genetics, requiring geneticists and breeders to find a balance between data quality and the number of genotyped lines under a variety of different existing genotyping technologies when resources are limited. In this work, we are proposing a new imputation pipeline (“HBimpute”) that can be used to generate high-quality genomic data from low read-depth whole-genome-sequence data. The key idea of the pipeline is the use of haplotype blocks from the software HaploBlocker to identify locally similar lines and subsequently use the reads of all locally similar lines in the variant calling for a specific line. The effectiveness of the pipeline is showcased on a dataset of 321 doubled haploid lines of a European maize landrace, which were sequenced at 0.5X read-depth. The overall imputing error rates are cut in half compared to state-of-the-art software like BEAGLE and STITCH, while the average read-depth is increased to 83X, thus enabling the calling of copy number variation. The usefulness of the obtained imputed data panel is further evaluated by comparing the performance of sequence data in common breeding applications to that of genomic data generated with a genotyping array. For both genome-wide association studies and genomic prediction, results are on par or even slightly better than results obtained with high-density array data (600k). In particular for genomic prediction, we observe slightly higher data quality for the sequence data compared to the 600k array in the form of higher prediction accuracies. This occurred specifically when reducing the data panel to the set of overlapping markers between sequence and array, indicating that sequencing data can benefit from the same marker ascertainment as used in the array process to increase the quality and usability of genomic data.

Optimizing a metabarcoding primer portfolio for species-level detection of taxa in complex mixtures of diverse fishes

10.22541/au.163861686.62434613/v1 ◽

2021 ◽

Author(s):

Diana Baetscher ◽

Nicolas Locatelli ◽

Eugene Won ◽

Timothy Fitzgerald ◽

Peter McIntyre ◽

...

Keyword(s):

Empirical Studies ◽

Target Material ◽

Read Depth ◽

Abundant Species ◽

Reference Database ◽

Amplification Efficiency ◽

Multiple Regions ◽

Heterogeneous Tissue ◽

The Individual ◽

Dna Pool

DNA metabarcoding is used to enumerate and identify taxa in both environmental samples and tissue mixtures. The composition and resolution of metabarcoding data depend on the primer(s) used. Markers that amplify different genes can mitigate biases in primer affinity, amplification efficiency, and reference database resolution, but few empirical studies have evaluated markers for complementary performance. Here, we assess the individual and joint performance of 22 markers for detecting species in a DNA pool of >100 species of primarily marine and freshwater fishes, but also including representatives of elasmobranchs, cephalopods, and crustaceans. Marker performance includes the integrated effect of primer specificity and reference availability. We find that a portfolio of four markers targeting 12S, 16S, and multiple regions of COI identifies 100% of reference taxa to family and nearly 60% to species. We then use the four markers in this portfolio to evaluate metabarcoding of heterogeneous tissue mixtures, using experimental fishmeal to test: 1) the tissue input threshold to ensure detection; 2) how read depth scales with tissue abundance; and 3) the effect of non-target material in the mixture on recovery of target taxa. We consistently detect taxa that make up >1% of fishmeal mixtures and can detect taxa at the lowest input level of 0.01%, but rare taxa (<1%) were detected inconsistently across markers and replicates. Read counts showed weak correlation with tissue input, suggesting they are not a valid proxy for relative abundance. Despite this limitation, our results demonstrate the value of a primer portfolio approach—tailored to the taxa of interest—for detecting and identifying both rare and abundant species in heterogeneous tissue mixtures.

CNV-P: a machine-learning framework for predicting high confident copy number variations

PeerJ ◽

10.7717/peerj.12564 ◽

2021 ◽

Vol 9 ◽

pp. e12564

Author(s):

Taifu Wang ◽

Jinghua Sun ◽

Xiuqing Zhang ◽

Wen-Jing Wang ◽

Qing Zhou

Keyword(s):

Machine Learning ◽

False Positive ◽

Copy Number ◽

Genetic Disorders ◽

Genetic Diseases ◽

Basic Research ◽

Read Depth ◽

Copy Number Variations ◽

Sequencing Data ◽

Learning Framework

Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.

Derivedness Index for Estimating Degree of Phenotypic Evolution of Embryos: A Study of Comparative Transcriptomic Analyses of Chordates and Echinoderms

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.749963 ◽

2021 ◽

Vol 9 ◽

Author(s):

Jason Cheok Kuan Leong ◽

Yongxin Li ◽

Masahiro Uesaka ◽

Yui Uchida ◽

Akihito Omori ◽

...

Keyword(s):

Gene Expression ◽

Hox Genes ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Read Depth ◽

Phenotypic Evolution ◽

Major Barrier ◽

Living Fossils ◽

Transcriptomic Level ◽

Developmental Phases

Species retaining ancestral features, such as species called living fossils, are often regarded as less derived than their sister groups, but such discussions are usually based on qualitative enumeration of conserved traits. This approach creates a major barrier, especially when quantifying the degree of phenotypic evolution or degree of derivedness, since it focuses only on commonly shared traits, and newly acquired or lost traits are often overlooked. To provide a potential solution to this problem, especially for inter-species comparison of gene expression profiles, we propose a new method named “derivedness index” to quantify the degree of derivedness. In contrast to the conservation-based approach, which deals with expressions of commonly shared genes among species being compared, the derivedness index also considers those that were potentially lost or duplicated during evolution. By applying our method, we found that the gene expression profiles of penta-radial phases in echinoderm tended to be more highly derived than those of the bilateral phase. However, our results suggest that echinoderms may not have experienced much larger modifications to their developmental systems than chordates, at least at the transcriptomic level. In vertebrates, we found that the mid-embryonic and organogenesis stages were generally less derived than the earlier or later stages, indicating that the conserved phylotypic period is also less derived. We also found genes that potentially explain less derivedness, such as Hox genes. Finally, we highlight technical concerns that may influence the measured transcriptomic derivedness, such as read depth and library preparation protocols, for further improvement of our method through future studies. We anticipate that this index will serve as a quantitative guide in the search for constrained developmental phases or processes.

Development and validation of an expanded targeted sequencing panel for non-invasive prenatal diagnosis of sporadic skeletal dysplasia

BMC Medical Genomics ◽

10.1186/s12920-021-01063-1 ◽

2021 ◽

Vol 14 (S3) ◽

Author(s):

Ching-Yuan Wang ◽

Yen-An Tang ◽

I-Wen Lee ◽

Fong-Ming Chang ◽

Chun-Wei Chien ◽

...

Keyword(s):

Prenatal Diagnosis ◽

Early Pregnancy ◽

Skeletal Dysplasia ◽

Low Frequency ◽

Read Depth ◽

Targeted Sequencing ◽

Normal Sample ◽

Amplification Efficiency ◽

Lower Accuracy ◽

Ion Proton

Abstract Background Skeletal dysplasia (SD) is one of the most common inherited neonatal disorders worldwide, where the recurrent pathogenic mutations in the FGFR2, FGFR3, COL1A1, COL1A2 and COL2A1 genes are frequently reported in both non-lethal and lethal SD. The traditional prenatal diagnosis of SD using ultrasonography suffers from lower accuracy and performed at latter gestational stage. Therefore, it remains in desperate need of precise and accurate prenatal diagnosis of SD in early pregnancy. With the advancements of next-generation sequencing (NGS) technology and bioinformatics analysis, it is feasible to develop a NGS-based assay to detect genetic defects in association with SD in the early pregnancy. Methods An ampliseq-based targeted sequencing panel was designed to cover 87 recurrent hotspots reported in 11 common dominant SD and run on both Ion Proton and NextSeq550 instruments. Thirty-six cell-free and 23 genomic DNAs were used for assay developed. Spike-in DNA prepared from standard sample harboring known mutation and normal sample were also employed to validate the established SD workflow. Overall performances of coverage, uniformity, and on-target rate, and the detecting limitations on percentage of fetal fraction and read depth were evaluated. Results The established targeted-seq workflow enables a single-tube multiplex PCR for library construction and shows high amplification efficiency and robust reproducibility on both Ion Proton and NextSeq550 platforms. The workflow reaches 100% coverage and both uniformity and on-target rate are > 96%, indicating a high quality assay. Using spike-in DNA with different percentage of known FGFR3 mutation (c.1138 G > A), the targeted-seq workflow demonstrated the ability to detect low-frequency variant of 2.5% accurately. Finally, we obtained 100% sensitivity and 100% specificity in detecting target mutations using established SD panel. Conclusions An expanded panel for rapid and cost-effective genetic detection of SD has been developed. The established targeted-seq workflow shows high accuracy to detect both germline and low-frequency variants. In addition, the workflow is flexible to be conducted in the majority of the NGS instruments and ready for routine clinical application. Taken together, we believe the established panel provides a promising diagnostic or therapeutic strategy for prenatal genetic testing of SD in routine clinical practice.

CNVpytor: a tool for copy number variation detection and analysis from read depth and allele imbalance in whole-genome sequencing

GigaScience ◽

10.1093/gigascience/giab074 ◽

2021 ◽

Vol 10 (11) ◽

Cited By ~ 1

Author(s):

Milovan Suvakov ◽

Arijit Panda ◽

Colin Diesh ◽

Ian Holmes ◽

Alexej Abyzov

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Read Depth ◽

Copy Number Variations ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Modular Architecture

Abstract Background Detecting copy number variations (CNVs) and copy number alterations (CNAs) based on whole-genome sequencing data is important for personalized genomics and treatment. CNVnator is one of the most popular tools for CNV/CNA discovery and analysis based on read depth. Findings Herein, we present an extension of CNVnator developed in Python—CNVpytor. CNVpytor inherits the reimplemented core engine of its predecessor and extends visualization, modularization, performance, and functionality. Additionally, CNVpytor uses B-allele frequency likelihood information from single-nucleotide polymorphisms and small indels data as additional evidence for CNVs/CNAs and as primary information for copy number–neutral losses of heterozygosity. Conclusions CNVpytor is significantly faster than CNVnator—particularly for parsing alignment files (2–20 times faster)—and has (20–50 times) smaller intermediate files. CNV calls can be filtered using several criteria, annotated, and merged over multiple samples. Modular architecture allows it to be used in shared and cloud environments such as Google Colab and Jupyter notebook. Data can be exported into JBrowse, while a lightweight plugin version of CNVpytor for JBrowse enables nearly instant and GUI-assisted analysis of CNVs by any user. CNVpytor release and the source code are available on GitHub at https://github.com/abyzovlab/CNVpytor under the MIT license.

Accurate quantification of overlapping herpesvirus transcripts from RNA-seq data

Journal of Virology ◽

10.1128/jvi.01635-21 ◽

2021 ◽

Author(s):

Alejandro Casco ◽

Akansha Gupta ◽

Mitchell Hayes ◽

Reza Djavadian ◽

Makoto Ohashi ◽

...

Keyword(s):

Gene Expression ◽

Transcript Abundance ◽

Read Depth ◽

Overlapping Genes ◽

Rna Seq ◽

Laptop Computer ◽

Lytic Gene ◽

Short Read ◽

Coding Sequence ◽

Unique Transcript

Herpesviruses employ extensive bidirectional transcription of overlapping genes to overcome length constraints on their gene product repertoire. As a consequence, many lytic transcripts cannot be measured individually by RT-qPCR or conventional RNA-seq analysis. Bruce et al. (Pathogens 2017, 6, 11; doi:10.3390/pathogens6010011) proposed an approximation method using Unique CoDing Sequences (UCDS) to estimate lytic gene abundance from KSHV RNA-seq data. Although UCDS has been widely employed, its accuracy, to our knowledge, has never been rigorously validated for any herpesvirus. In this study, we use CAGE-seq as a gold-standard to determine the accuracy of UCDS for estimating EBV lytic gene expression levels from RNA-seq data. We also introduce the Unique TranScript (UTS) method that, like UCDS, estimates transcript abundance from changes in mean RNA-seq read-depth. UTS is distinguished by its use of empirically determined 5’ and 3’ transcript ends, rather than coding sequence annotations. Compared to conventional read assignment, both UCDS and UTS improved quantitation accuracy of overlapping genes, with UTS giving the most accurate results. The UTS method discards fewer reads and may be advantageous for experiments with less sequencing depth. UTS is compatible with any aligner and, unlike isoform-aware alignment methods, can be implemented on a laptop computer. Our findings demonstrate that accuracy achieved by complex and expensive techniques such as CAGE-seq can be approximated using conventional short-read RNA-seq data when read assignment methods address transcript overlap. Although our study focuses on EBV transcription, the UTS method should be applicable across all herpesviruses and other genomes with extensively overlapping transcriptomes. IMPORTANCE Many viruses employ extensively overlapping transcript structures. This complexity makes it difficult to quantify gene expression using conventional methods including RNA-seq. Although high-throughput techniques that overcome these limitations exist, they are complex, expensive, and scarce in herpesvirus literature relative to short-read RNA-seq. Here, using Epstein-Barr virus (EBV) as a model, we demonstrate that conventional RNA-seq analysis methods fail to accurately quantify abundance of many overlapping transcripts. We further show that the previously described Unique CoDing Sequence (UCDS) and our Unique TranScript (UTS) methods greatly improve the accuracy of EBV lytic gene measurements obtained from RNA-seq data. The UTS method has the advantages of discarding fewer reads and being implementable on a laptop computer. Although this study focuses on EBV, the UCDS and UTS methods should be applicable across herpesviruses and for other viruses that make extensive use of overlapping transcription.

Patterns of Genomic Instability in Interspecific Yeast Hybrids With Diverse Ancestries

Frontiers in Fungal Biology ◽

10.3389/ffunb.2021.742894 ◽

2021 ◽

Vol 2 ◽

Author(s):

Devin P. Bendixsen ◽

David Peris ◽

Rike Stelkens

Keyword(s):

Phenotypic Diversity ◽

Chromosomal Rearrangements ◽

Nuclear Genome ◽

Read Depth ◽

Genomic Variation ◽

Genomic Diversity ◽

Ecological Niches ◽

Parent Species ◽

Genomic Architecture ◽

Gains And Losses

The genomes of hybrids often show substantial deviations from the features of the parent genomes, including genomic instabilities characterized by chromosomal rearrangements, gains, and losses. This plastic genomic architecture generates phenotypic diversity, potentially giving hybrids access to new ecological niches. It is however unclear if there are any generalizable patterns and predictability in the type and prevalence of genomic variation and instability across hybrids with different genetic and ecological backgrounds. Here, we analyzed the genomic architecture of 204 interspecific Saccharomyces yeast hybrids isolated from natural, industrial fermentation, clinical, and laboratory environments. Synchronous mapping to all eight putative parental species showed significant variation in read depth indicating frequent aneuploidy, affecting 44% of all hybrid genomes and particularly smaller chromosomes. Early generation hybrids with largely equal genomic content from both parent species were more likely to contain aneuploidies than introgressed genomes with an older hybridization history, which presumably stabilized the genome. Shared k-mer analysis showed that the degree of genomic diversity and variability varied among hybrids with different parent species. Interestingly, more genetically distant crosses produced more similar hybrid genomes, which may be a result of stronger negative epistasis at larger genomic divergence, putting constraints on hybridization outcomes. Mitochondrial genomes were typically inherited from the species also contributing the majority nuclear genome, but there were clear exceptions to this rule. Together, we find reliable genomic predictors of instability in hybrids, but also report interesting cross- and environment-specific idiosyncrasies. Our results are an important step in understanding the factors shaping divergent hybrid genomes and their role in adaptive evolution.

Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities

BMC Bioinformatics ◽

10.1186/s12859-021-04410-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Daniel P. Dacey ◽

Frédéric J. J. Chain

Keyword(s):

Read Depth ◽

Taxonomic Composition ◽

Taxonomic Classification ◽

Read Length ◽

Reference Database ◽

Reference Databases ◽

Sequence Quality ◽

First Time ◽

Mock Communities

Abstract Background Taxonomic classification of genetic markers for microbiome analysis is affected by the numerous choices made from sample preparation to bioinformatics analysis. Paired-end read merging is routinely used to capture the entire amplicon sequence when the read ends overlap. However, the exclusion of unmerged reads from further analysis can result in underestimating the diversity in the sequenced microbial community and is influenced by bioinformatic processes such as read trimming and the choice of reference database. A potential solution to overcome this is to concatenate (join) reads that do not overlap and keep them for taxonomic classification. The use of concatenated reads can outperform taxonomic recovery from single-end reads, but it remains unclear how their performance compares to merged reads. Using various sequenced mock communities with different amplicons, read length, read depth, taxonomic composition, and sequence quality, we tested how merging and concatenating reads performed for genus recall and precision in bioinformatic pipelines combining different parameters for read trimming and taxonomic classification using different reference databases. Results The addition of concatenated reads to merged reads always increased pipeline performance. The top two performing pipelines both included read concatenation, with variable strengths depending on the mock community. The pipeline that combined merged and concatenated reads that were quality-trimmed performed best for mock communities with larger amplicons and higher average quality sequences. The pipeline that used length-trimmed concatenated reads outperformed quality trimming in mock communities with lower quality sequences but lost a significant amount of input sequences for taxonomic classification during processing. Genus level classification was more accurate using the SILVA reference database compared to Greengenes. Conclusions Merged sequences with the addition of concatenated sequences that were unable to be merged increased performance of taxonomic classifications. This was especially beneficial in mock communities with larger amplicons. We have shown for the first time, using an in-depth comparison of pipelines containing merged vs concatenated reads combined with different trimming parameters and reference databases, the potential advantages of concatenating sequences in improving resolution in microbiome investigations.

read depth
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Frequency of the STRC-CATSPER2 deletion in STRC-associated hearing loss patients

Increasing calling accuracy, coverage, and read-depth in sequence data by the use of haplotype blocks

Optimizing a metabarcoding primer portfolio for species-level detection of taxa in complex mixtures of diverse fishes

CNV-P: a machine-learning framework for predicting high confident copy number variations

Derivedness Index for Estimating Degree of Phenotypic Evolution of Embryos: A Study of Comparative Transcriptomic Analyses of Chordates and Echinoderms

Development and validation of an expanded targeted sequencing panel for non-invasive prenatal diagnosis of sporadic skeletal dysplasia

CNVpytor: a tool for copy number variation detection and analysis from read depth and allele imbalance in whole-genome sequencing

Accurate quantification of overlapping herpesvirus transcripts from RNA-seq data

Patterns of Genomic Instability in Interspecific Yeast Hybrids With Diverse Ancestries

Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities

Export Citation Format

read depthRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Frequency of the STRC-CATSPER2 deletion in STRC-associated hearing loss patients

Increasing calling accuracy, coverage, and read-depth in sequence data by the use of haplotype blocks

Optimizing a metabarcoding primer portfolio for species-level detection of taxa in complex mixtures of diverse fishes

CNV-P: a machine-learning framework for predicting high confident copy number variations

Derivedness Index for Estimating Degree of Phenotypic Evolution of Embryos: A Study of Comparative Transcriptomic Analyses of Chordates and Echinoderms

Development and validation of an expanded targeted sequencing panel for non-invasive prenatal diagnosis of sporadic skeletal dysplasia

CNVpytor: a tool for copy number variation detection and analysis from read depth and allele imbalance in whole-genome sequencing

Accurate quantification of overlapping herpesvirus transcripts from RNA-seq data

Patterns of Genomic Instability in Interspecific Yeast Hybrids With Diverse Ancestries

Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities

read depth
Recently Published Documents