gene count
Recently Published Documents


TOTAL DOCUMENTS

36
(FIVE YEARS 17)

H-INDEX

7
(FIVE YEARS 2)

2022 ◽  
Author(s):  
Sagnik Banerjee ◽  
Carson Andorf

Advancement in technology has enabled sequencing machines to produce vast amounts of genetic data, causing an increase in storage demands. Most genomic software utilizes read alignments for several purposes including transcriptome assembly and gene count estimation. Herein we present, ABRIDGE, a state-of-the-art compressor for SAM alignment files offering users both lossless and lossy compression options. This reference-based file compressor achieves the best compression ratio among all compression software ensuring lower space demand and faster file transmission. Central to the software is a novel algorithm that retains non-redundant information. This new approach has allowed ABRIDGE to achieve a compression 16% higher than the second-best compressor for RNA-Seq reads and over 35% for DNA-Seq reads. ABRIDGE also offers users the option to randomly access location without having to decompress the entire file. ABRIDGE is distributed under MIT license and can be obtained from GitHub and docker hub. We anticipate that the user community will adopt ABRIDGE within their existing pipeline encouraging further research in this domain.


2021 ◽  
Vol 27 (1) ◽  
Author(s):  
Sofia Moran-Ramos ◽  
Luis Macias-Kauffer ◽  
Blanca E. López-Contreras ◽  
Hugo Villamil-Ramírez ◽  
Elvira Ocampo-Medina ◽  
...  

Abstract Background Elevations of circulating branched-chain amino acids (BCAA) are observed in humans with obesity and metabolic comorbidities, such as insulin resistance. Although it has been described that microbial metabolism contributes to the circulating pool of these amino acids, studies are still scarce, particularly in pediatric populations. Thus, we aimed to explore whether in early adolescents, gut microbiome was associated to circulating BCAA and in this way to insulin resistance. Methods Shotgun sequencing was performed in DNA from fecal samples of 23 early adolescents (10–12 years old) and amino acid targeted metabolomics analysis was performed by LC–MS/MS in serum samples. By using the HUMAnN2 algorithm we explored microbiome functional profiles to identify whether bacterial metabolism contributed to serum BCAA levels and insulin resistance markers. Results We identified that abundance of genes encoding bacterial BCAA inward transporters were negatively correlated with circulating BCAA and HOMA-IR (P < 0.01). Interestingly, Faecalibacterium prausnitzii contributed to approximately ~ 70% of bacterial BCAA transporters gene count. Moreover, Faecalibacterium prausnitzii abundance was also negatively correlated with circulating BCAA (P = 0.001) and with HOMA-IR (P = 0.018), after adjusting for age, sex and body adiposity. Finally, the association between Faecalibacterium genus and BCAA levels was replicated over an extended data set (N = 124). Conclusions We provide evidence that gut bacterial BCAA transport genes, mainly encoded by Faecalibacterium prausnitzii, are associated with lower circulating BCAA and lower insulin resistance. Based on the later, we propose that the relationship between Faecalibacterium prausnitzii and insulin resistance, could be through modulation of BCAA.


2021 ◽  
Author(s):  
Sathiyanarayanan Manivannan ◽  
Vidu Garg

Single-cell transcriptomic analyses permit a high-resolution investigation of biological processes at the individual cell level. Single-cell transcriptomics technologies such as Drop-seq, Smart-seq, MARS-seq, sci-RNA-seq, and CELL-seq produce large volumes of data in the form of sequence reads. In general, the alignment of the reads to genomes and the enumeration of reads mapping to a specific gene results in a gene-count matrix. These gene-count matrix data require robust quality control and statistical analytical pipelines before data mining and interpretation. Among these post-alignment pipelines, the 'Seurat' package in 'R' is the most popular analytical pipeline for the analysis of single-cell data. This package provides quality control, normalization, principal component analysis, dimensional reduction, clustering, and marker identification among other functions needed to process and mine the single-cell transcriptomic data. While the Seurat package is continuously updated and includes a variety of functionalities, the user is still required to be proficient in the 'R' programming language and its data structures to be able to execute the Seurat functions. Hence, there is a demand for a graphical user interface (GUI) that takes in relevant input information and processes the single-cell data using the Seurat pipeline. A GUI will also highly improve the access to single-cell data for life sciences researchers who are not trained in the command-line operation of the 'R' platform. To meet this demand, we present R Shiny apps 'Natian' and 'Ryabhatta' to assist in the generation and analysis of Seurat files from a variety of different sources. The apps and example data can be downloaded from https://singlecelltranscriptomics.org. Natian allows users to create Seurat files from the output of multiple pipelines, integrate existing Seurat files, add metadata information, perform dimensional reduction analysis or upload dimensional reduction data, resume partially processed Seurat files and find cluster markers. Ryabhatta allows users to visualize gene expression using a variety of plotting options, analyze cluster markers, rename clusters, select cells from a graph or based on expression levels of markers, perform differential expression, count the number of cells in each condition, and perform pseudotime analysis using Monocle. We found that the use of these apps substantially improved the analytical and processing time and remove needless troubleshooting due to incompatible commands, typographical errors in scripts, and cluttering of the R environment with variables. We hope the use of these apps improves the use of single-cell data for life sciences research while also providing a tool to learn the functionalities of Seurat and R functions available for single-cell data analysis.


Author(s):  
Sandhya Prabhakaran ◽  
Tal Nawy ◽  
Dana Pe’er’

AbstractBackgroundImaging-based spatial transcriptomics has the power to reveal patterns of single-cell gene expression by detecting mRNA transcripts as individually resolved spots in multiplexed images. However, molecular quantification has been severely limited by the computational challenges of segmenting poorly outlined, overlapping cells, and of overcoming technical noise; the majority of transcripts are routinely discarded because they fall outside the segmentation boundaries. This lost information leads to less accurate gene count matrices and weakens downstream analyses, such as cell type or gene program identification.ResultsHere, we present Sparcle, a probabilistic model that reassigns transcripts to cells based on gene covariation patterns and incorporates spatial features such as distance to nucleus. We demonstrate its utility on both multiplexed error-robust fluorescence in situ hybridization (MERFISH) and single-molecule FISH (smFISH) data.ConclusionsSparcle improves transcript assignment, providing more realistic per-cell quantification of each gene, better delineation of cell boundaries, and improved cluster assignments. Critically, our approach does not require an accurate segmentation and is agnostic to technological platform.


2020 ◽  
Vol 33 (9) ◽  
pp. 1103-1107 ◽  
Author(s):  
Dhruv Aditya Srivastava ◽  
Gulab Chand Arya ◽  
Eswari PJ Pandaranayaka ◽  
Ekaterina Manasherova ◽  
Dov B. Prusky ◽  
...  

Botrytis cinerea is a foliar necrotrophic fungal-pathogen capable of infecting >580 genera of plants, is often used as model organism for studying fungal-host interactions. We used RNAseq to study transcriptome of B. cinerea infection on a major (worldwide) vegetable crop, tomato (Solanum lycopersicum). Most previous works explored only few infection stages, using RNA extracted from entire leaf-organ diluting the expression of studied infected region. Many studied B. cinerea infection, on detached organs assuming that similar defense/physiological reactions occurs in the intact plant. We analyzed transcriptome of the pathogen and host in 5 infection stages of whole-plant leaves at the infection site. We supply high quality, pathogen-enriched gene count that facilitates future research of the molecular processes regulating the infection process.


2020 ◽  
Vol 10 (5) ◽  
pp. 1477-1484
Author(s):  
Kumar Saurabh Singh ◽  
David J. Hosken ◽  
Nina Wedell ◽  
Richard ffrench-Constant ◽  
Chris Bass ◽  
...  

Meadow brown butterflies (Maniola jurtina) on the Isles of Scilly represent an ideal model in which to dissect the links between genotype, phenotype and long-term patterns of selection in the wild - a largely unfulfilled but fundamental aim of modern biology. To meet this aim, a clear description of genotype is required. Here we present the draft genome sequence of M. jurtina to serve as a founding genetic resource for this species. Seven libraries were constructed using pooled DNA from five wild caught spotted females and sequenced using Illumina, PacBio RSII and MinION technology. A novel hybrid assembly approach was employed to generate a final assembly with an N50 of 214 kb (longest scaffold 2.9 Mb). The sequence assembly described here predicts a gene count of 36,294 and includes variants and gene duplicates from five genotypes. Core BUSCO (Benchmarking Universal Single-Copy Orthologs) gene sets of Arthropoda and Insecta recovered 90.5% and 88.7% complete and single-copy genes respectively. Comparisons with 17 other Lepidopteran species placed 86.5% of the assembled genes in orthogroups. Our results provide the first high-quality draft genome and annotation of the butterfly M. jurtina.


Author(s):  
Jing Xia ◽  
Aarthi Venkat ◽  
Michael L. Reese ◽  
Karine Le Roch ◽  
Ferhat Ay ◽  
...  

ABSTRACTToxoplasma gondii is an obligate intracellular parasite that has a significant impact on human health, especially in the immunocompromised. This parasite is also a useful genetic model for intracellular parasitism given its ease of culture in the laboratory and relevant animal models. However, as for many other eukaryotes, the T. gondii genome is incomplete, containing hundreds of sequence gaps due to the presence of repetitive and/or uncloneable sequences that prevent complete telomere-to-telomere de novo chromosome assembly. Here, we report the first use of single molecule DNA sequencing to generate near complete de novo genome assemblies for T. gondii and its near relative, N. caninum. Using the Oxford Nanopore Minion platform, we dramatically improved the contiguity of the T. gondii genome (N50 of ∼6.6Mb) and increased overall assembled sequence compared to current reference sequences by ∼2 Mb. Multiple complete chromosomes were fully assembled as evidenced by clear telomeric repeats on the end of each contig. Interestingly, for all of the Toxoplasma gondii strains that we sequenced (RH, CTG, II×III F1 progeny clones CL13, S27, S21, and S26), the largest contig ranged in size between 11.9 and 12.1 Mb in size, which is larger than any previously reported T. gondii chromosome. This was due to a repeatable and consistent fusion of chromosomes VIIb and VIII. These data were further validated by mapping existing T. gondii ME49 Hi-C data to our assembly, providing parallel lines of evidence that the T. gondii karyotype consists of 13, rather than 14, chromosomes. In addition revising the molecular karyotype we were also able to resolve hundreds of repeats derived from both coding and non-coding tandem sequence expansions. For well-known host-targeting effector loci like rhoptry protein 5 (ROP5) and ROP38, we were also able to accurately determine the precise gene count, order and orientation using established assembly approaches and the most likely primary sequence of each using our own assembly correction scripts tailored to correcting homopolymeric run errors in tandem sequence arrays. Finally, when we compared the T. gondii and N. caninum assemblies we found that while the 13 chromosome karyotype was conserved, we determined that previously unidentified large scale translocation events occurred in T. gondii and N. caninum since their most recent common ancestry.


2019 ◽  
Vol 36 (7) ◽  
pp. 2293-2294
Author(s):  
Xiao Tan ◽  
Andrew Su ◽  
Minh Tran ◽  
Quan Nguyen

Abstract Motivation Spatial transcriptomics (ST) technology is increasingly being applied because it enables the measurement of spatial gene expression in an intact tissue along with imaging morphology of the same tissue. However, current analysis methods for ST data do not use image pixel information, thus missing the quantitative links between gene expression and tissue morphology. Results We developed a user-friendly deep learning software, SpaCell, to integrate millions of pixel intensity values with thousands of gene expression measurements from spatially barcoded spots in a tissue. We show the integration approach outperforms the use of gene-count data alone or imaging data alone to build deep learning models to identify cell types or predict labels of tissue images with high resolution and accuracy. Availability and implementation The SpaCell package is open source under an MIT licence and it is available at https://github.com/BiomedicalMachineLearning/SpaCell. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Hatice Akarsu ◽  
Lisandra Aguilar-Bultet ◽  
Laurent Falquet

Abstract Background Comparative genomics has seen the development of many software performing the clustering, polymorphism and gene content analysis of genomes at different phylogenetic levels (isolates, species). These tools rely on de novo assembly and/or multiple alignments that can be computationally intensive for large datasets. With a large number of similar genomes in particular, e.g., in surveillance and outbreak detection, assembling each genome can become a redundant and expensive step in the identification of genes potentially involved in a given clinical feature. Results We have developed deltaRpkm, an R package that performs a rapid differential gene presence evaluation between two large groups of closely related genomes. Starting from a standard gene count table, deltaRpkm computes the RPKM per gene per sample, then the inter-group δRPKM values, the corresponding median δRPKM (m) for each gene and the global standard deviation value of m (sm). Genes with m >  = 2 ∗ sm (standard deviation s of all the m values) are considered as “differentially present” in the reference genome group. Our simple yet effective method of differential RPKM has been successfully applied in a recent study published by our group (N = 225 genomes of Listeria monocytogenes) (Aguilar-Bultet et al. Front Cell Infect Microbiol 8:20, 2018). Conclusions To our knowledge, deltaRpkm is the first tool to propose a straightforward inter-group differential gene presence analysis with large datasets of related genomes, including non-coding genes, and to output directly a list of genes potentially involved in a phenotype.


Sign in / Sign up

Export Citation Format

Share Document