next generation sequencing data Latest Research Papers

ECCsplorer: a pipeline to detect extrachromosomal circular DNA (eccDNA) from next-generation sequencing data

BMC Bioinformatics ◽

10.1186/s12859-021-04545-2 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Ludwig Mann ◽

Kathrin M. Seibt ◽

Beatrice Weber ◽

Tony Heitkam

Keyword(s):

Next Generation Sequencing ◽

Transposable Elements ◽

Data Availability ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Circular Dna ◽

Wide Range ◽

Generation Sequencing

Abstract Background Extrachromosomal circular DNAs (eccDNAs) are ring-like DNA structures physically separated from the chromosomes with 100 bp to several megabasepairs in size. Apart from carrying tandemly repeated DNA, eccDNAs may also harbor extra copies of genes or recently activated transposable elements. As eccDNAs occur in all eukaryotes investigated so far and likely play roles in stress, cancer, and aging, they have been prime targets in recent research—with their investigation limited by the scarcity of computational tools. Results Here, we present the ECCsplorer, a bioinformatics pipeline to detect eccDNAs in any kind of organism or tissue using next-generation sequencing techniques. Following Illumina-sequencing of amplified circular DNA (circSeq), the ECCsplorer enables an easy and automated discovery of eccDNA candidates. The data analysis encompasses two major procedures: first, read mapping to the reference genome allows the detection of informative read distributions including high coverage, discordant mapping, and split reads. Second, reference-free comparison of read clusters from amplified eccDNA against control sample data reveals specifically enriched DNA circles. Both software parts can be run separately or jointly, depending on the individual aim or data availability. To illustrate the wide applicability of our approach, we analyzed semi-artificial and published circSeq data from the model organisms Homo sapiens and Arabidopsis thaliana, and generated circSeq reads from the non-model crop plant Beta vulgaris. We clearly identified eccDNA candidates from all datasets, with and without reference genomes. The ECCsplorer pipeline specifically detected mitochondrial mini-circles and retrotransposon activation, showcasing the ECCsplorer’s sensitivity and specificity. Conclusion The ECCsplorer (available online at https://github.com/crimBubble/ECCsplorer) is a bioinformatics pipeline to detect eccDNAs in any kind of organism or tissue using next-generation sequencing data. The derived eccDNA targets are valuable for a wide range of downstream investigations—from analysis of cancer-related eccDNAs over organelle genomics to identification of active transposable elements.

ifCNV: a novel isolation-forest-based package to detect copy number variations from NGS datasets

10.1101/2022.01.03.474771 ◽

2022 ◽

Author(s):

Simon Cabello ◽

Julie A Vendrell ◽

Charles Van Goethem ◽

Mehdi Brousse ◽

Catherine Gozé ◽

...

Keyword(s):

Artificial Intelligence ◽

Copy Number ◽

High Sensitivity ◽

Copy Number Variations ◽

Careful Consideration ◽

Machine Learning Algorithms ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Scoring Method ◽

Cnv Detection

Copy number variations (CNVs) are an essential component of genetic variation distributed across large parts of the human genome. CNV detection from next-generation sequencing data and artificial intelligence algorithms has progressed in recent years. However, only a few tools have taken advantage of machine learning algorithms for CNV detection. The most developed approach is to use a reference dataset to compare with the samples of interest, and it is well known that selecting appropriate normal samples represents a challenging task which dramatically influences the precision of results in all CNV-detecting tools. With careful consideration of these issues, we propose here ifCNV, a new software based on isolation forests that creates its own reference, available in R and python with customisable parameters. ifCNV combines artificial intelligence using two isolation forests and a comprehensive scoring method to faithfully detect CNVs among various samples. It was validated using datasets from diverse origins, and it exhibits high sensitivity, specificity and accuracy. ifCNV is a publicly available open-source software that allows the detection of CNVs in many clinical situations.

Proteogenomic Analysis Reveals Proteins Involved in the First Step of Adipogenesis in Human Adipose-Derived Stem Cells

Stem Cells International ◽

10.1155/2021/3168428 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Bernardo Bonilauri ◽

Amanda C. Camillo-Andrade ◽

Marlon D. M. Santos ◽

Juliana de S. da G. Fischer ◽

Paulo C. Carvalho ◽

...

Keyword(s):

Stem Cells ◽

Protein Extraction ◽

Adipogenic Differentiation ◽

Shotgun Proteomics ◽

Whole Body ◽

Next Generation Sequencing Data ◽

Adipose Derived Stem Cells ◽

Differentiation Process ◽

Sequencing Data ◽

Whole Body Metabolism

Background. Obesity is characterized as a disease that directly affects the whole-body metabolism and is associated with excess fat mass and several related comorbidities. Dynamics of adipocyte hypertrophy and hyperplasia play an important role in health and disease, especially in obesity. Human adipose-derived stem cells (hASC) represent an important source for understanding the entire adipogenic differentiation process. However, little is known about the triggering step of adipogenesis in hASC. Here, we performed a proteogenomic approach for understanding the protein abundance alterations during the initiation of the adipogenic differentiation process. Methods. hASC were isolated from adipose tissue of three donors and were then characterized and expanded. Cells were cultured for 24 hours in adipogenic differentiation medium followed by protein extraction. We used shotgun proteomics to compare the proteomic profile of 24 h-adipogenic, differentiated, and undifferentiated hASC. We also used our previous next-generation sequencing data (RNA-seq) of the total and polysomal mRNA fractions of hASC to study posttranscriptional regulation during the initial steps of adipogenesis. Results. We identified 3420 proteins out of 48,336 peptides, of which 92 proteins were exclusively identified in undifferentiated hASC and 53 proteins were exclusively found in 24 h-differentiated cells. Using a stringent criterion, we identified 33 differentially abundant proteins when comparing 24 h-differentiated and undifferentiated hASC (14 upregulated and 19 downregulated, respectively). Among the upregulated proteins, we shortlisted several adipogenesis-related proteins. A combined analysis of the proteome and the transcriptome allowed the identification of positive correlation coefficients between proteins and mRNAs. Conclusions. These results demonstrate a specific proteome profile related to adipogenesis at the beginning (24 hours) of the differentiation process in hASC, which advances the understanding of human adipogenesis and obesity. Adipogenic differentiation is finely regulated at the transcriptional, posttranscriptional, and posttranslational levels.

Enhanced Bayesian detection for copy number alterations from next-generation sequencing data

10.1109/bibm52615.2021.9669548 ◽

2021 ◽

Author(s):

Zhenhua Yu ◽

Fang Du

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Next Generation Sequencing Data ◽

Copy Number Alterations ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Detection and Functional Verification of Noncanonical Splice Site Mutations in Hereditary Deafness

Frontiers in Genetics ◽

10.3389/fgene.2021.773922 ◽

2021 ◽

Vol 12 ◽

Author(s):

Penghui Chen ◽

Longhao Wang ◽

Yongchuan Chai ◽

Hao Wu ◽

Tao Yang

Keyword(s):

Next Generation Sequencing ◽

Splice Site ◽

Negative Impact ◽

Intron Retention ◽

Exon Skipping ◽

Next Generation Sequencing Data ◽

Functional Verification ◽

Next Generation ◽

Hereditary Deafness ◽

Generation Sequencing

Splice site mutations contribute to a significant portion of the genetic causes for mendelian disorders including deafness. By next-generation sequencing of 4 multiplex, autosomal dominant families and 2 simplex, autosomal recessive families with hereditary deafness, we identified a variety of candidate pathogenic variants in noncanonical splice sites of known deafness genes, which include c.1616+3A > T and c.580G > A in EYA4, c.322-57_322-8del in PAX3, c.991-15_991-13del in DFNA5, c.6087-3T > G in PTPRQ and c.164+5G > A in USH1G. All six variants were predicted to affect the RNA splicing by at least one of the computational tools Human Splicing Finder, NNSPLICE and NetGene2. Phenotypic segregation of the variants was confirmed in all families and is consistent with previously reported genotype-phenotype correlations of the corresponding genes. Minigene analysis showed that those splicing site variants likely have various negative impact including exon-skipping (c.1616+3A > T and c.580G > A in EYA4, c.991-15_991-13del in DFNA5), intron retention (c.322-57_322-8del in PAX3), exon skipping and intron retention (c.6087-3T > G in PTPRQ) and shortening of exon (c.164+5G > A in USH1G). Our study showed that the cryptic, noncanonical splice site mutations may play an important role in the molecular etiology of hereditary deafness, whose diagnosis can be facilitated by modified filtering criteria for the next-generation sequencing data, functional verification, as well as segregation, bioinformatics, and genotype-phenotype correlation analysis.

easyfm: An easy software suite for file manipulation of Next Generation Sequencing data on desktops

10.22541/au.163845474.49811073/v1 ◽

2021 ◽

Author(s):

Hyungtaek Jung ◽

Brendan Jeon ◽

Daniel Ortiz-Barrientos

Keyword(s):

Next Generation Sequencing ◽

Life Sciences ◽

Next Generation Sequencing Data ◽

Command Line ◽

Next Generation ◽

Web Based ◽

File Formats ◽

Wide Range ◽

Ngs Data ◽

Generation Sequencing

Storing and manipulating Next Generation Sequencing (NGS) file formats for understanding biological phenomena is an essential but difficult task in the life sciences. Yet, most methods for analysing NGS data require complex command-line tools in high-performance computing (HPC) or web-based servers and have not yet been implemented in comprehensive, easy-to-use software. Here we present easyfm (easy file manipulation), a free standalone Graphical User Interface (GUI) software with Python support that can be used to facilitate the rapid discovery of target sequences (or user’s interest) in NGS datasets for novice users (more accessible to biologists). It enables them to perform end-to-end reproducible data analyses using a desktop application (Windows, Mac and Linux). Unlike existing tools, the GUI-based easyfm is not dependent on any HPC system and can be operated without an internet connection. For user-friendliness and convenience, easyfm was developed with four work modules and a secondary GUI window, covering different aspects of NGS data analysis, including post-processing, filtering, format conversion, generating results, real-time log, and help. In combination with the executable tools (BLAST+ and BLAT) and Python, easyfm allows the user to set analysis parameters, select/extract regions of interest, examine the input and output results, and convert to a wide range of file formats. To help augment the functionality of existing web-based and command-line tools, easyfm, a self-contained program, comes with extensive documentation (https://github.com/TaekAndBrendan/easyfm). This specific benefit allows easyfm to seamlessly integrate visual and interactive representations of NGS files, supporting a wider scope of bioinformatics applications in the life sciences.

Hidden biases in germline structural variant detection

Genome Biology ◽

10.1186/s13059-021-02558-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Michael M. Khayat ◽

Sayed Mohammad Ebrahim Sahraeian ◽

Samantha Zarate ◽

Andrew Carroll ◽

Huixiao Hong ◽

...

Keyword(s):

Next Generation Sequencing ◽

False Negative ◽

False Negative Rate ◽

Next Generation Sequencing Data ◽

Chinese Family ◽

Next Generation ◽

Sequencing Data ◽

Structural Variations ◽

The Impact ◽

Generation Sequencing

Abstract Background Genomic structural variations (SV) are important determinants of genotypic and phenotypic changes in many organisms. However, the detection of SV from next-generation sequencing data remains challenging. Results In this study, DNA from a Chinese family quartet is sequenced at three different sequencing centers in triplicate. A total of 288 derivative data sets are generated utilizing different analysis pipelines and compared to identify sources of analytical variability. Mapping methods provide the major contribution to variability, followed by sequencing centers and replicates. Interestingly, SV supported by only one center or replicate often represent true positives with 47.02% and 45.44% overlapping the long-read SV call set, respectively. This is consistent with an overall higher false negative rate for SV calling in centers and replicates compared to mappers (15.72%). Finally, we observe that the SV calling variability also persists in a genotyping approach, indicating the impact of the underlying sequencing and preparation approaches. Conclusions This study provides the first detailed insights into the sources of variability in SV identification from next-generation sequencing and highlights remaining challenges in SV calling for large cohorts. We further give recommendations on how to reduce SV calling variability and the choice of alignment methodology.

Consistent count region–copy number variation (CCR-CNV): an expandable and robust tool for clinical diagnosis of copy number variation at the exon level using next-generation sequencing data

Genetics in Medicine ◽

10.1016/j.gim.2021.10.025 ◽

2021 ◽

Author(s):

Man Jin Kim ◽

Sungyoung Lee ◽

Hongseok Yun ◽

Sung Im Cho ◽

Boram Kim ◽

...

Keyword(s):

Next Generation Sequencing ◽

Copy Number Variation ◽

Clinical Diagnosis ◽

Copy Number ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Number Variation ◽

Exon Level ◽

Generation Sequencing

HLA-G genetic diversity and evolutive aspects in worldwide populations

Scientific Reports ◽

10.1038/s41598-021-02106-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Erick C. Castelli ◽

Bibiana S. de Almeida ◽

Yara C. N. Muniz ◽

Nayane S. B. Silva ◽

Marília R. S. Passos ◽

...

Keyword(s):

Genetic Diversity ◽

Immune Checkpoint ◽

Balancing Selection ◽

Association Studies ◽

Biological Properties ◽

Next Generation Sequencing Data ◽

Regulatory Sequences ◽

Sequencing Data ◽

High Coverage ◽

Checkpoint Molecule

AbstractHLA-G is a promiscuous immune checkpoint molecule. The HLA-G gene presents substantial nucleotide variability in its regulatory regions. However, it encodes a limited number of proteins compared to classical HLA class I genes. We characterized the HLA-G genetic variability in 4640 individuals from 88 different population samples across the globe by using a state-of-the-art method to characterize polymorphisms and haplotypes from high-coverage next-generation sequencing data. We also provide insights regarding the HLA-G genetic diversity and a resource for future studies evaluating HLA-G polymorphisms in different populations and association studies. Despite the great haplotype variability, we demonstrated that: (1) most of the HLA-G polymorphisms are in introns and regulatory sequences, and these are the sites with evidence of balancing selection, (2) linkage disequilibrium is high throughout the gene, extending up to HLA-A, (3) there are few proteins frequently observed in worldwide populations, with lack of variation in residues associated with major HLA-G biological properties (dimer formation, interaction with leukocyte receptors). These observations corroborate the role of HLA-G as an immune checkpoint molecule rather than as an antigen-presenting molecule. Understanding HLA-G variability across populations is relevant for disease association and functional studies.

MeX pipeline for analysis of mobile genetic elements in cancer genome

10.31219/osf.io/7ywnm ◽

2021 ◽

Author(s):

Preeti P ◽

Robin Sinha ◽

kamal rawal

Keyword(s):

Human Genome ◽

Cancer Genomics ◽

Mobile Genetic Elements ◽

Chromosome 1 ◽

Next Generation Sequencing Data ◽

Upstream Region ◽

Sequencing Data ◽

Alu Elements ◽

Genetic Elements ◽

Paired End Sequencing

Background: Mobile genetic elements (MGEs) comprise a major portion of the human genome and are essential for genetic diversity. These elements are known to have the capability to induce mutations in the human genome. To date, there are several MGE insertions which have been reported to be associated with cancer. We aim to use genome next-generation sequencing data and appropriate bioinformatics tools to accurately identify the insertion sites of MGEs in the human genome.Results: Herein, we introduce the MeX pipeline for the localization and annotation of MGEs in paired-end sequencing data. It requires the reference genome sequence, MGE sequences and paired-end sequencing reads. We evaluated MeX on high depth (>75×) Illumina HiSeq data produced at the Broad Institute (NA12878) against human genome 38-built (including only chromosome 1, 2 and 3) and Alu elements. We could identify 78 reference and 1 non-reference Alu insertions in the NA12878 sample. Upon annotation, it was found that the non-reference Alu element was in the 3' UTR region of the RNF2 gene. Out of 78 reference insertions, 42 were in the intronic region, 7 in the upstream region, 5 in the downstream region, 1 in the 3’ UTR region and the rest were not associated with any gene. MeX showed high performance for the identification and annotation of MGEs in genome samples.Conclusion: This study showed that MeX is a robust and powerful tool for the identification and annotation of MGE insertions. It may also serve as a valuable tool to study the phenotypic changes resulting from transpositional events in cancer genomics.

next generation sequencing data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

ECCsplorer: a pipeline to detect extrachromosomal circular DNA (eccDNA) from next-generation sequencing data

ifCNV: a novel isolation-forest-based package to detect copy number variations from NGS datasets

Proteogenomic Analysis Reveals Proteins Involved in the First Step of Adipogenesis in Human Adipose-Derived Stem Cells

Enhanced Bayesian detection for copy number alterations from next-generation sequencing data

Detection and Functional Verification of Noncanonical Splice Site Mutations in Hereditary Deafness

easyfm: An easy software suite for file manipulation of Next Generation Sequencing data on desktops

Hidden biases in germline structural variant detection

Consistent count region–copy number variation (CCR-CNV): an expandable and robust tool for clinical diagnosis of copy number variation at the exon level using next-generation sequencing data

HLA-G genetic diversity and evolutive aspects in worldwide populations

MeX pipeline for analysis of mobile genetic elements in cancer genome

Export Citation Format

next generation sequencing dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

ECCsplorer: a pipeline to detect extrachromosomal circular DNA (eccDNA) from next-generation sequencing data

ifCNV: a novel isolation-forest-based package to detect copy number variations from NGS datasets

Proteogenomic Analysis Reveals Proteins Involved in the First Step of Adipogenesis in Human Adipose-Derived Stem Cells

Enhanced Bayesian detection for copy number alterations from next-generation sequencing data

Detection and Functional Verification of Noncanonical Splice Site Mutations in Hereditary Deafness

easyfm: An easy software suite for file manipulation of Next Generation Sequencing data on desktops

Hidden biases in germline structural variant detection

Consistent count region–copy number variation (CCR-CNV): an expandable and robust tool for clinical diagnosis of copy number variation at the exon level using next-generation sequencing data

HLA-G genetic diversity and evolutive aspects in worldwide populations

MeX pipeline for analysis of mobile genetic elements in cancer genome

next generation sequencing data
Recently Published Documents