gene filtering
Recently Published Documents


TOTAL DOCUMENTS

26
(FIVE YEARS 10)

H-INDEX

7
(FIVE YEARS 3)

2021 ◽  
Author(s):  
jiawei Zou ◽  
miaochen Wang ◽  
zhen Zhang ◽  
zheqi Liu ◽  
xiaobin Zhang ◽  
...  

Differential expression (DE) gene detection in single-cell RNA-seq (scRNA-seq) data is a key step to understand the biological question investigated. We find that DE methods together with gene filtering have profound impact on DE gene identification, and different datasets will benefit from personalized DE gene detection strategies. Existing tools don't take gene filtering into consideration, and couldn't evaluate DE performance on real datasets without prior knowledge of true results. Based on two new metrics, we propose scCODE (single cell Consensus Optimization of Differentially Expressed gene detection), an R package to automatically optimize DE gene detection for each experimental scRNA-seq dataset.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
William J. Lane ◽  
Judith Aeschlimann ◽  
Sunitha Vege ◽  
Christine Lomas-Francis ◽  
Anna Burgos ◽  
...  

AbstractEmm is a high incidence red cell antigen with eight previously reported Emm− probands. Anti-Emm appears to be naturally occurring yet responsible for a clinically significant acute hemolytic transfusion reaction. Previous work suggests that Emm is located on a GPI-anchored protein, but the antigenic epitope and genetic basis have been elusive. We investigated samples from a South Asian Indian family with two Emm− brothers by whole genome sequencing (WGS). Additionally, samples from four unrelated Emm− individuals were investigated for variants in the candidate gene. Filtering for homozygous variants found in the Emm− brothers and by gnomAD frequency of < 0.001 resulted in 1818 variants with one of high impact; a 2-bp deletion causing a frameshift and premature stop codon in PIGG [NM_001127178.3:c.2624_2625delTA, p.(Leu875*), rs771819481]. PIGG encodes for a transferase, GPI-ethanolaminephosphate transferase II, which adds ethanolamine phosphate (EtNP) to the second mannose in a GPI-anchor. The four additional unrelated Emm− individuals had various PIGG mutations; deletion of Exons 2–3, deletion of Exons 7–9, insertion/deletion (indel) in Exon 3, and new stop codon in Exon 5. The Emm− phenotype is associated with a rare deficiency of PIGG, potentially defining a new Emm blood group system composed of EtNP bound to mannose, part of the GPI-anchor. The results are consistent with the known PI-linked association of the Emm antigen, and may explain the production of the antibody in the absence of RBC transfusion. Any association with neurologic phenotypes requires further research.


2021 ◽  
Author(s):  
Ping-Han Hsieh ◽  
Camila Miranda Lopes-Ramos ◽  
Geir Kjetil Sandve ◽  
Kimberly Glass ◽  
Marieke Lydia Kuijjer

Gene co-expression measurements are widely used in computational biology to identify coordinated expression patterns across a group of samples, which may indicate that these genes are controlled by the same transcriptional regulatory program, or involved in common biological processes. Gene co-expression is generally estimated from RNA-Seq data, which are generally normalized to remove technical variability. Here, we find and demonstrate that certain normalization methods, in particular quantile-based methods, can introduce false-positive associations between genes, and that this can consequently hamper downstream co-expression network analysis. Quantile-based normalization can, however, be extremely powerful. In particular when preprocessing large-scale heterogeneous data, quantile-based normalization can be applied to remove technical variability while maintaining global differences in expression for samples with different biological attributes. We therefore developed CAIMAN, a method to correct for false-positive associations that may arise from normalization of RNA-Seq data. CAIMAN utilizes a Gaussian mixture model to fit the distribution of gene expression and to adaptively select the threshold to define lowly expressed genes, which are prone to form false-positive associations. Thereafter, CAIMAN corrects the normalized expression for these genes by removing the variability across samples that might lead to false-positive associations. Moreover, CAIMAN avoids arbitrary gene filtering and retains associations to genes that only express in small subgroups of samples, highlighting its potential future impact on network modeling and other association-based approaches in large-scale heterogeneous data.


Genes ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1487
Author(s):  
Marie Lataretu ◽  
Martin Hölzer

RNA-Seq enables the identification and quantification of RNA molecules, often with the aim of detecting differentially expressed genes (DEGs). Although RNA-Seq evolved into a standard technique, there is no universal gold standard for these data’s computational analysis. On top of that, previous studies proved the irreproducibility of RNA-Seq studies. Here, we present a portable, scalable, and parallelizable Nextflow RNA-Seq pipeline to detect DEGs, which assures a high level of reproducibility. The pipeline automatically takes care of common pitfalls, such as ribosomal RNA removal and low abundance gene filtering. Apart from various visualizations for the DEG results, we incorporated downstream pathway analysis for common species as Homo sapiens and Mus musculus. We evaluated the DEG detection functionality while using qRT-PCR data serving as a reference and observed a very high correlation of the logarithmized gene expression fold changes.


Children ◽  
2020 ◽  
Vol 7 (9) ◽  
pp. 144
Author(s):  
Ji Yoon Han ◽  
Hyun Joo Lee ◽  
Young-Mock Lee ◽  
Joonhong Park

Febrile seizure (FS) is related to a febrile illness (temperature > 38 °C) not caused by an infection of central nervous system, without neurologic deficits in children aged 6–60 months. The family study implied a polygenic model in the families of proband(s) with single FS, however in families with repeated FS, inheritance was matched to autosomal dominance with reduced disease penetrance. A 20 month-old girl showed recurrent FS and afebrile seizures without developmental delay or intellectual disability. The seizures disappeared after 60 months without anti-seizure medication. The 35 year-old proband’s mother also experienced five episodes of simple FS and two episodes of unprovoked seizures before 5 years old. Targeted exome sequencing was conducted along with epilepsy/seizure-associated gene-filtering to identify the candidate causative mutation. As a result, a heterozygous c.2039A>G of the ADGRV1 gene leading to a codon change of aspartic acid to glycine at the position 680 (rs547076322) was identified. This protein’s glycine residue is highly conserved, and its allele frequency is 0.00002827 in the gnomAD population database. ADGRV1 mutation may have an influential role in the occurrence of genetic epilepsies, especially those with febrile and afebrile seizures. Further investigation of ADGRV1 mutations is needed to prove that it is a significant susceptible gene for febrile and/or afebrile seizures in early childhood.


2020 ◽  
Author(s):  
Lorin M. Towle-Miller ◽  
Jeffrey C. Miecznikowski ◽  
Fan Zhang ◽  
David L. Tritchler

AbstractMotivationWe present a method for dimension reduction designed to filter variables or features such as genes considered to be irrelevant for a downstream analysis designed to detect supervised gene networks in sparse settings. This approach can improve interpret-ability for a variety of analysis methods. We present a method to filter genes and transcripts prior to network analysis. This method has applications in a setting where the downstream analysis may include sparse canonical correlation analysis.ResultsFiltering methods specifically for cluster and network analysis are introduced and compared by simulating modular networks with known statistical properties. Our proposed method performs favorably eliminating irrelevant features but maintaining important biological signal under a variety of different signal settings. We show that the speed and accuracy of methods such as sparse canonical correlation are increased after filtering, thus greatly improving the scalability of these approaches.AvailabilityCode for performing the gene filtering algorithm described in this manuscript may be accessed through the geneFiltering R package available on Github at https://github.com/lorinmil/geneFiltering. Functions are available to filter genes and perform simulations of a network system. For access to the data used in this manuscript, contact corresponding [email protected], [email protected], [email protected], and [email protected]


2019 ◽  
Vol 47 (22) ◽  
pp. e143-e143 ◽  
Author(s):  
Changde Cheng ◽  
John Easton ◽  
Celeste Rosencrance ◽  
Yan Li ◽  
Bensheng Ju ◽  
...  

Abstract Single-cell RNA sequencing (scRNA-seq) is a powerful tool for characterizing the cell-to-cell variation and cellular dynamics in populations which appear homogeneous otherwise in basic and translational biological research. However, significant challenges arise in the analysis of scRNA-seq data, including the low signal-to-noise ratio with high data sparsity, potential batch effects, scalability problems when hundreds of thousands of cells are to be analyzed among others. The inherent complexities of scRNA-seq data and dynamic nature of cellular processes lead to suboptimal performance of many currently available algorithms, even for basic tasks such as identifying biologically meaningful heterogeneous subpopulations. In this study, we developed the Latent Cellular Analysis (LCA), a machine learning–based analytical pipeline that combines cosine-similarity measurement by latent cellular states with a graph-based clustering algorithm. LCA provides heuristic solutions for population number inference, dimension reduction, feature selection, and control of technical variations without explicit gene filtering. We show that LCA is robust, accurate, and powerful by comparison with multiple state-of-the-art computational methods when applied to large-scale real and simulated scRNA-seq data. Importantly, the ability of LCA to learn from representative subsets of the data provides scalability, thereby addressing a significant challenge posed by growing sample sizes in scRNA-seq data analysis.


GigaScience ◽  
2019 ◽  
Vol 8 (8) ◽  
Author(s):  
Anne Senabouth ◽  
Samuel W Lukowski ◽  
Jose Alquicira Hernandez ◽  
Stacey B Andersen ◽  
Xin Mei ◽  
...  

Abstract Background Recent developments in single-cell RNA sequencing (scRNA-seq) platforms have vastly increased the number of cells typically assayed in an experiment. Analysis of scRNA-seq data is multidisciplinary in nature, requiring careful consideration of the application of statistical methods with respect to the underlying biology. Few analysis packages exist that are at once robust, are computationally fast, and allow flexible integration with other bioinformatics tools and methods. Findings ascend is an R package comprising tools designed to simplify and streamline the preliminary analysis of scRNA-seq data, while addressing the statistical challenges of scRNA-seq analysis and enabling flexible integration with genomics packages and native R functions, including fast parallel computation and efficient memory management. The package incorporates both novel and established methods to provide a framework to perform cell and gene filtering, quality control, normalization, dimension reduction, clustering, differential expression, and a wide range of visualization functions. Conclusions ascend is designed to work with scRNA-seq data generated by any high-throughput platform and includes functions to convert data objects between software packages. The ascend workflow is simple and interactive, as well as suitable for implementation by a broad range of users, including those with little programming experience.


2019 ◽  
Author(s):  
Yutong Wang ◽  
Tasha Thong ◽  
Venkatesh Saligrama ◽  
Justin Colacino ◽  
Laura Balzano ◽  
...  

AbstractUnsupervised feature selection, or gene filtering, is a common preprocessing step to reduce the dimensionality of single-cell RNA sequencing (scRNAseq) data sets. Existing gene filters operate on scRNAseq datasets in isolation from other datasets. When jointly analyzing multiple datasets, however, there is a need for gene filters that are tailored to comparative analysis. In this work, we present a method for ranking the relevance of genes for comparing trajectory datasets. Our method is unsupervised, i.e., the cell metadata are not assumed to be known. Using the top-ranking genes significantly improves performance compared to methods not tailored to comparative analysis. We demonstrate the effectiveness of our algorithm on previously published datasets from studies on preimplantation embryo development, neurogenesis and cardiogenesis.


Sign in / Sign up

Export Citation Format

Share Document