average gene
Recently Published Documents


TOTAL DOCUMENTS

69
(FIVE YEARS 22)

H-INDEX

13
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Diego Nahuel Cortez ◽  
Gonzalo Andres Neira ◽  
Carolina Mabel Gonzalez ◽  
Eva Marilyn Vergara ◽  
David S Holmes

Genome streamlining theory suggests that reduction of microbial genome size optimizes energy utilization in stressful environments. Although this hypothesis has been explored in several cases of low nutrient (oligotrophic) and high temperature environments, little work has been carried out on microorganisms from low pH environments and what has been reported is inconclusive. In this study, we performed a large-scale comparative genomics investigation of more than 260 bacterial high-quality genome sequences of acidophiles, together with genomes of their closest phylogenetic relatives that live at circum-neutral pH. A statistically supported correlation is reported between reduction of genome size and decreasing pH that we demonstrate is due to gene loss and reduced gene sizes. This trend is independent from other genome size constraints such as temperature and G+C content. Genome streamlining in the evolution of acidophilic Bacteria is thus supported by our results. Analyses of predicted COG categories and subcellular location predictions indicate that acidophiles have a lower representation of genes encoding extra-cellular proteins, signal transduction mechanisms and proteins with unknown function, but are enriched in inner membrane proteins, chaperones, basic metabolism, and core cellular functions. Contrary to other reports for genome streamlining, there was no significant change in paralog frequencies across pH. However, a detailed analysis of COG categories revealed a higher proportion of genes in acidophiles in the following categories: 'Replication and repair', 'Amino acid transport' and 'Intracellular trafficking'. This study brings increasing clarity regarding genomic adaptations of acidophiles to life at low pH while putting elements such as the reduction of average gene size under the spotlight of streamlining theory.


2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Tamara Milivojević ◽  
Shirin Nurshan Rahman ◽  
Débora Raposo ◽  
Michael Siccha ◽  
Michal Kucera ◽  
...  

AbstractMetabarcoding has become the workhorse of community ecology. Sequencing a taxonomically informative DNA fragment from environmental samples gives fast access to community composition across taxonomic groups, but it relies on the assumption that the number of sequences for each taxon correlates with its abundance in the sampled community. However, gene copy number varies among and within taxa, and the extent of this variability must therefore be considered when interpreting community composition data derived from environmental sequencing. Here we measured with single-cell qPCR the SSU rDNA gene copy number of 139 specimens of five species of planktonic foraminifera. We found that the average gene copy number varied between of ~4000 to ~50,000 gene copies between species, and individuals of the same species can carry between ~300 to more than 350,000 gene copies. This variability cannot be explained by differences in cell size and considering all plausible sources of bias, we conclude that this variability likely reflects dynamic genomic processes acting during the life cycle. We used the observed variability to model its impact on metabarcoding and found that the application of a correcting factor at species level may correct the derived relative abundances, provided sufficiently large populations have been sampled.


2021 ◽  
Author(s):  
Audrey Garcia ◽  
Tri Le ◽  
Paul Jankowski ◽  
Kadir Yanac ◽  
Qiuyan Yuan ◽  
...  

We investigated the potential use and quantitation of human enteric viruses in municipal wastewater samples of Winnipeg (Manitoba, Canada) as alternative indicators of contamination and evaluated the processing stages of the wastewater treatment plant. During the fall 2019 and winter 2020 seasons, samples of raw sewage, activated sludge, effluents, and biosolids (sludge cake) were collected from the North End Sewage Treatment Plant (NESTP), which is the largest wastewater treatment plant in the City of Winnipeg. DNA and RNA enteric viruses, as well as the uidA gene found in Escherichia coli were targeted in the samples collected from the NESTP. Total nucleic acids from each wastewater treatment sample were extracted using a commercial spin-column kit. Enteric viruses were quantitated in the extracted samples via quantitative PCR using TaqMan assays. The average gene copies assessed in the raw sewage were not significantly different (p-values ranged between 0.0547 and 0.7986) than the average gene copies assessed in the effluents for Adenovirus and crAssphage (DNA viruses), Pepper Mild Mottle Virus (RNA virus), and uidA in terms of both volume and biomass. A significant reduction of these enteric viruses was observed consistently in activated sludge samples compared with those for raw sewage. Corresponding reductions in gene copies per volume and gene copies per biomass were also seen for uidA but were not statistically significant (p-value = 0.8769 and p-value = 0.6353, respectively). The higher gene copy numbers of enteric viruses and E. coli observed in the effluents may be associated with the 12-hour hydraulic retention time in the facility. Enteric viruses found in gene copy numbers were at least one order of magnitude higher than the E. coli marker uidA. This indicate that enteric viruses may survive the wastewater treatment process and viral-like particles are being released into the aquatic environment. Our results suggest that Adenovirus, crAssphage, and Pepper mild mottle virus can be used as complementary viral indicators of human fecal pollution.


2021 ◽  
Author(s):  
Scott Hotaling ◽  
Joanna L Kelley ◽  
Paul B Frandsen

In less than 25 years, the field of animal genome science has transformed from a discipline seeking its first glimpses into genome sequences across the Tree of Life to a global enterprise with ambitions to sequence genomes for all of Earth's eukaryotic diversity (1). As the field rapidly moves forward, it is important to take stock of the progress that has been made to best inform the discipline's future. In this perspective, we provide a contemporary, quantitative perspective on animal genome sequencing. We identified the best available genome assemblies on GenBank, the world's most extensive genetic database, for 3,278 unique animals across 24 phyla. We assessed taxonomic representation, assembly quality, and annotation status for major clades. We show that while tremendous taxonomic progress has occurred, stark disparities in genomic representation exist, highlighted by a systemic overrepresentation of vertebrates and underrepresentation of arthropods. In terms of assembly quality, long-read sequencing has dramatically improved contiguity and, on average, gene annotations are available for just 34.3% of taxa. Furthermore, we show that animal genome science has diversified in recent years with an ever-expanding pool of researchers participating. However, the field still appears to be dominated by institutions in the Global North, which have been listed as the submitting institution for 77% of all assemblies. We conclude by offering recommendations for how we can collectively improve genomic resource availability and value while also broadening representation worldwide.


2021 ◽  
Vol 9 (2) ◽  
pp. 427
Author(s):  
Alei Geng ◽  
Meng Jin ◽  
Nana Li ◽  
Daochen Zhu ◽  
Rongrong Xie ◽  
...  

Glycoside hydrolase (GH) represents a crucial category of enzymes for carbohydrate utilization in most organisms. A series of glycoside hydrolase families (GHFs) have been classified, with relevant information deposited in the CAZy database. Statistical analysis indicated that most GHFs (134 out of 154) were prone to exist in bacteria rather than archaea, in terms of both occurrence frequencies and average gene numbers. Co-occurrence analysis suggested the existence of strong or moderate-strong correlations among 63 GHFs. A combination of network analysis by Gephi and functional classification among these GHFs demonstrated the presence of 12 functional categories (from group A to L), with which the corresponding microbial collections were subsequently labeled, respectively. Interestingly, a progressive enrichment of particular GHFs was found among several types of microbes, and type-L as well as type-E microbes were deemed as functional intensified species which formed during the microbial evolution process toward efficient decomposition of lignocellulose as well as pectin, respectively. Overall, integrating network analysis and enzymatic functional classification, we were able to provide a new angle of view for GHs from known prokaryotic genomes, and thus this study is likely to guide the selection of GHs and microbes for efficient biomass utilization.


Genes ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 63
Author(s):  
Ibrahim Juma ◽  
Mulatu Geleta ◽  
Helena Persson Hovmalm ◽  
Agnes Nyomora ◽  
Ganapathi Varma Saripella ◽  
...  

Tanzania has been growing avocado for decades. A wide variability of the avocado germplasm has been found, and the crop is largely contributing to the earnings of the farmers, traders, and the government, but its genetic diversity is scantly investigated. With the purpose of comparing morphological and genetic characteristics of this germplasm and uncovering the correlation between them and the geographical location, 226 adult seedling avocado trees were sampled in southwestern Tanzania. Their morphological characters were recorded, and their genetic diversity was evaluated based on 10 microsatellite loci. Discriminant analysis of principal components showed that the germplasm studied consisted of four genetic clusters that had an overall average gene diversity of 0.59 and 15.9% molecular variation among them. Most of the phenotypes were common in at least two clusters. The genetic clusters were also portrayed by multivariate analysis and hierarchical clustering for the molecular data but not for the morphology data. Using the Mantel test, a weak significant correlation was found between the genetic, morphological, and geographical distances, which indicates that the genetic variation present in the material is weakly reflected by the observed phenotypic variation and that both measures of variation varied slightly with the geographical sampling locations.


2020 ◽  
Author(s):  
Ivan Croydon Veleslavov ◽  
Michael P.H. Stumpf

AbstractSingle cell transcriptomics has laid bare the heterogeneity of apparently identical cells at the level of gene expression. For many cell-types we now know that there is variability in the abundance of many transcripts, and that average transcript abun-dance or average gene expression can be a unhelpful concept. A range of clustering and other classification methods have been proposed which use the signal in single cell data to classify, that is assign cell types, to cells based on their transcriptomic states. In many cases, however, we would like to have not just a classifier, but also a set of interpretable rules by which this classification occurs. Here we develop and demonstrate the interpretive power of one such approach, which sets out to establish a biologically interpretable classification scheme. In particular we are interested in capturing the chain of regulatory events that drive cell-fate decision making across a lineage tree or lineage sequence. We find that suitably defined decision trees can help to resolve gene regulatory programs involved in shaping lineage trees. Our approach combines predictive power with interpretabilty and can extract logical rules from single cell data.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0243360
Author(s):  
Johan Gustafsson ◽  
Jonathan Robinson ◽  
Juan S. Inda-Díaz ◽  
Elias Björnson ◽  
Rebecka Jörnsten ◽  
...  

Single-cell RNA sequencing has become a valuable tool for investigating cell types in complex tissues, where clustering of cells enables the identification and comparison of cell populations. Although many studies have sought to develop and compare different clustering approaches, a deeper investigation into the properties of the resulting populations is lacking. Specifically, the presence of misclassified cells can influence downstream analyses, highlighting the need to assess subpopulation purity and to detect such cells. We developed DSAVE (Down-SAmpling based Variation Estimation), a method to evaluate the purity of single-cell transcriptome clusters and to identify misclassified cells. The method utilizes down-sampling to eliminate differences in sampling noise and uses a log-likelihood based metric to help identify misclassified cells. In addition, DSAVE estimates the number of cells needed in a population to achieve a stable average gene expression profile within a certain gene expression range. We show that DSAVE can be used to find potentially misclassified cells that are not detectable by similar tools and reveal the cause of their divergence from the other cells, such as differing cell state or cell type. With the growing use of single-cell RNA-seq, we foresee that DSAVE will be an increasingly useful tool for comparing and purifying subpopulations in single-cell RNA-Seq datasets.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Thanutra Zhang ◽  
Robert Foreman ◽  
Roy Wollman

AbstractGene expression variability, differences in the number of mRNA per cell across a population of cells, is ubiquitous across diverse organisms with broad impacts on cellular phenotypes. The role of chromatin in regulating average gene expression has been extensively studied. However, what aspects of the chromatin contribute to gene expression variability is still underexplored. Here we addressed this problem by leveraging chromatin diversity and using a systematic investigation of randomly integrated expression reporters to identify what aspects of chromatin microenvironment contribute to gene expression variability. Using DNA barcoding and split-pool decoding, we created a large library of isogenic reporter clones and identified reporter integration sites in a massive and parallel manner. By mapping our measurements of reporter expression at different genomic loci with multiple epigenetic profiles including the enrichment of transcription factors and the distance to different chromatin states, we identified new factors that impact the regulation of gene expression distributions.


Genes ◽  
2020 ◽  
Vol 11 (10) ◽  
pp. 1218
Author(s):  
Sarita Mahtani-Williams ◽  
William Fulton ◽  
Amelie Desvars-Larrive ◽  
Sara Lado ◽  
Jean Pierre Elbers ◽  
...  

Across the distribution of the Caspian whipsnake (Dolichophis caspius), populations have become increasingly disconnected due to habitat alteration. To understand population dynamics and this widespread but locally endangered snake’s adaptive potential, we investigated population structure, admixture, and effective migration patterns. We took a landscape-genomic approach to identify selected genotypes associated with environmental variables relevant to D. caspius. With double-digest restriction-site associated DNA (ddRAD) sequencing of 53 samples resulting in 17,518 single nucleotide polymorphisms (SNPs), we identified 8 clusters within D. caspius reflecting complex evolutionary patterns of the species. Estimated Effective Migration Surfaces (EEMS) revealed higher-than-average gene flow in most of the Balkan Peninsula and lower-than-average gene flow along the middle section of the Danube River. Landscape genomic analysis identified 751 selected genotypes correlated with 7 climatic variables. Isothermality correlated with the highest number of selected genotypes (478) located in 41 genes, followed by annual range (127) and annual mean temperature (87). We conclude that environmental variables, especially the day-to-night temperature oscillation in comparison to the summer-to-winter oscillation, may have an important role in the distribution and adaptation of D. caspius.


Sign in / Sign up

Export Citation Format

Share Document