scholarly journals Geneshot: search engine for ranking genes from arbitrary text queries

2019 ◽  
Vol 47 (W1) ◽  
pp. W571-W577 ◽  
Author(s):  
Alexander Lachmann ◽  
Brian M Schilder ◽  
Megan L Wojciechowicz ◽  
Denis Torre ◽  
Maxim V Kuleshov ◽  
...  

Abstract The frequency by which genes are studied correlates with the prior knowledge accumulated about them. This leads to an imbalance in research attention where some genes are highly investigated while others are ignored. Geneshot is a search engine developed to illuminate this gap and to promote attention to the under-studied genome. Through a simple web interface, Geneshot enables researchers to enter arbitrary search terms, to receive ranked lists of genes relevant to the search terms. Returned ranked gene lists contain genes that were previously published in association with the search terms, as well as genes predicted to be associated with the terms based on data integration from multiple sources. The search results are presented with interactive visualizations. To predict gene function, Geneshot utilizes gene–gene similarity matrices from processed RNA-seq data, or from gene–gene co-occurrence data obtained from multiple sources. In addition, Geneshot can be used to analyze the novelty of gene sets and augment gene sets with additional relevant genes. The Geneshot web-server and API are freely and openly available from https://amp.pharm.mssm.edu/geneshot.

2019 ◽  
Vol 47 (W1) ◽  
pp. W212-W224 ◽  
Author(s):  
Alexandra B Keenan ◽  
Denis Torre ◽  
Alexander Lachmann ◽  
Ariel K Leong ◽  
Megan L Wojciechowicz ◽  
...  

AbstractIdentifying the transcription factors (TFs) responsible for observed changes in gene expression is an important step in understanding gene regulatory networks. ChIP-X Enrichment Analysis 3 (ChEA3) is a transcription factor enrichment analysis tool that ranks TFs associated with user-submitted gene sets. The ChEA3 background database contains a collection of gene set libraries generated from multiple sources including TF–gene co-expression from RNA-seq studies, TF–target associations from ChIP-seq experiments, and TF–gene co-occurrence computed from crowd-submitted gene lists. Enrichment results from these distinct sources are integrated to generate a composite rank that improves the prediction of the correct upstream TF compared to ranks produced by individual libraries. We compare ChEA3 with existing TF prediction tools and show that ChEA3 performs better. By integrating the ChEA3 libraries, we illuminate general transcription factor properties such as whether the TF behaves as an activator or a repressor. The ChEA3 web-server is available from https://amp.pharm.mssm.edu/ChEA3.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Vanitha Arumugam ◽  
Joy C. MacDermid ◽  
Dave Walton ◽  
Ruby Grewal

Abstract Introduction PAIN+ and PubMed are two electronic databases with two different mechanisms of evidence retrieval. PubMed is used to “Pull” evidence where clinicians can enter search terms to find answers while PAIN+ is a newly developed evidence repository where along with “Pull” service there is a “Push” service that alerts users about new research and the associated quality ratings, based on the individual preferences for content and altering criteria. Purpose The primary purpose of the study was to compare yield and usefulness of PubMed and PAIN+ in retrieving evidence to address clinical research questions on pain management. The secondary purpose of the study was to identify what search terms and methods were used by clinicians to target pain research. Study design Two-phase double blinded randomized crossover trial. Methods Clinicians (n = 76) who were exposed to PAIN+ for at least 1 year took part in this study. Participants were required to search for evidence 2 clinical question scenarios independently. The first clinical question was provided to all participants and thus, was multi-disciplinary. Participants were randomly assigned to search for evidence on their clinical question using either PAIN+ or PubMed through the electronic interface. Upon completion of the search with one search engine, they were crossed over to the other search engine. A similar process was done for a second scenario that was discipline-specific. The yield was calculated using number of retrieved articles presented to participants and usefulness was evaluated using a series of Likert scale questions embedded in the testing. Results Multidisciplinary scenario: Overall, the participants had an overall one-page yield of 715 articles for PAIN+ and 1135 articles for PubMed. The topmost article retrieved by PAIN+ was rated as more useful (p = 0.001). While, the topmost article retrieved by PubMed was rated as consistent with current clinical practice (p = 0.02). PubMed (48%) was preferred over PAIN+ (39%) to perform multidisciplinary search (p = 0.02). Discipline specific scenario: The participants had an overall one-page yield of 1046 articles for PAIN+ and 1398 articles for PubMed. The topmost article retrieved by PAIN+ was rated as more useful (p = 0.001) and consistent with current clinical practice (p = 0.02) than the articles retrieved by PubMed. PAIN+ (52%) was preferred over PubMed (29%) to perform discipline specific search. Conclusion Clinicians from different disciplines find both PAIN+ and PubMed useful for retrieving research studies to address clinical questions about pain management. Greater preferences and perceived usefulness of the top 3 retrieved papers was observed for PAIN+, but other dimensions of usefulness did not consistently favor either search engine. Trial registration Registered with ClinicalTrials.gov Identifier: NCT01348802, Date: May 5, 2011.


2021 ◽  
Author(s):  
Saket Choudhary ◽  
Rahul Satija

Heterogeneity in single-cell RNA-seq (scRNA-seq) data is driven by multiple sources, including biological variation in cellular state as well as technical variation introduced during experimental processing. Deconvolving these effects is a key challenge for preprocessing workflows. Recent work has demonstrated the importance and utility of count models for scRNA-seq analysis, but there is a lack of consensus on which statistical distributions and parameter settings are appropriate. Here, we analyze 58 scRNA-seq datasets that span a wide range of technologies, systems, and sequencing depths in order to evaluate the performance of different error models. We find that while a Poisson error model appears appropriate for sparse datasets, we observe clear evidence of overdispersion for genes with sufficient sequencing depth in all biological systems, necessitating the use of a negative binomial model. Moreover, we find that the degree of overdispersion varies widely across datasets, systems, and gene abundances, and argues for a data-driven approach for parameter estimation. Based on these analyses, we provide a set of recommendations for modeling variation in scRNA-seq data, particularly when using generalized linear models or likelihood-based approaches for preprocessing and downstream analysis.


2016 ◽  
Vol 3 (3) ◽  
pp. 263-290 ◽  
Author(s):  
Richard Joseph Waddington ◽  
SungJin Nam ◽  
Steven Lonn ◽  
Stephanie D. Teasley

Early Warning Systems (EWSs) aggregate multiple sources of data to provide timely information to stakeholders about students in need of academic support. There is an increasing need to incorporate relevant data about student behaviors into the algorithms underlying EWSs to improve predictors of students’ success or failure. Many EWSs currently incorporate counts of course resource use, although these measures provide no information about which resources students are using. We use seven years of data from seven core STEM courses at a large university to investigate the associations between students’ use of categorized course resources (e.g., lecture or exam preparation resources) and their final course grade. Using logistic regression, we find that students who use exam preparation resources to a greater degree than their peers are more likely to receive a final grade of B or higher. In contrast, students who use more lecture-related resources than their peers are less likely to receive a final grade of B or higher. We discuss the implications of our results for developers deciding how to incorporate categories of course resource usage data into EWSs, for academic advisors using this information with students, and for instructors deciding which resources to include on their LMS site.


2019 ◽  
Vol 17 (05) ◽  
pp. 1940010 ◽  
Author(s):  
Farhad Maleki ◽  
Katie L. Ovens ◽  
Daniel J. Hogan ◽  
Elham Rezaei ◽  
Alan M. Rosenberg ◽  
...  

Gene set analysis is a quantitative approach for generating biological insight from gene expression datasets. The abundance of gene set analysis methods speaks to their popularity, but raises the question of the extent to which results are affected by the choice of method. Our systematic analysis of 13 popular methods using 6 different datasets, from both DNA microarray and RNA-Seq origin, shows that this choice matters a great deal. We observed that the overall number of gene sets reported by each method differed by up to 2 orders of magnitude, and there was a bias toward reporting large gene sets with some methods. Furthermore, there was substantial disagreement between the 20 most statistically significant gene sets reported by the methods. This was also observed when expanding to the 100 most statistically significant reported gene sets. For different datasets of the same phenotype/condition, the top 20 and top 100 most significant results also showed little to no agreement even when using the same method. GAGE, PAGE, and ORA were the only methods able to achieve relatively high reproducibility when comparing the 20 and 100 most statistically significant gene sets. Biological validation on a juvenile idiopathic arthritis (JIA) dataset showed wide variation in terms of the relevance of the top 20 and top 100 most significant gene sets to known biology of the disease, where GAGE predicted the most relevant gene sets, followed by GSEA, ORA, and PAGE.


Author(s):  
Dietmar Wolfram

Unique queries submitted to the Excite search engine were analyzed for empirical regularities in the co-occurrence of search terms. The distribution of frequency of term pair occurrences was fitted to three models used in informetric studies to determine whether the pattern of term usage followed a Zipfian distribution. Relatively poor fits were obtained for two of the models tested. . .


1997 ◽  
Vol 29 (6) ◽  
pp. 989-1002 ◽  
Author(s):  
G R Crampton

The application of labour market matching theory to the context of urban spatial variations in vacancies, unemployment, and job search has recently begun to receive research attention. Empirical analysis is very difficult because of the virtual unobservability of job search. Various forms of theoretical study of spatial labour markets are summarised in this paper, together with macroeconomic empirical evidence on labour matching technology. The Cobb—Douglas form of the matching function is applied to a simple linear city model, and theoretical relationships are derived which would be necessary for a static urban labour market equilibrium. A start is made on the theoretical implications of calculating an optimal job search area for individual workers, and a complex integral form of a present value function is obtained.


2019 ◽  
Author(s):  
Mitchell Kluesner ◽  
Annette Arnold ◽  
Taga Lerner ◽  
Rafail Nikolaos Tasakis ◽  
Sandra Wüst ◽  
...  

ABSTRACTRNA editing is the base change that results from RNA deamination by two predominant classes of deaminases; the APOBEC family and the ADAR family. Respectively, deamination of nucleobases by these enzymes are responsible for endogenous editing of cytosine to uracil (C-to-U) and adenosine to inosine (A-to-I). RNA editing is known to play an essential role both in maintaining normal cellular function, as well as altered cellular physiology during oncogenesis and tumour progression. Analysis of RNA editing in these important processes, largely relies on RNA-seq technology for the detection and quantification of RNA editing sites. Despite the power of these technologies, multiple sources of error in detecting and measuring base editing still exist, therefore additional validation and quantification of editing through Sanger sequencing is still required for confirmation of editing. Depending on the number of RNA editing sites that are of interest, this validation step can be both expensive and time-consuming. To address this need we developed the tool MultiEditR which provides a simple, and cost-effective method of detecting and quantifying RNA editing form Sanger sequencing. We expect that MultiEditR will foster further discoveries in this rapidly expanding field.


2020 ◽  
Author(s):  
Ajay Patil ◽  
Ashwini Patil

AbstractSingle-cell RNA-seq is widely used to study transcriptional patterns of genes in individual cells. In spite of current advances in technology, assigning cell types in single-cell datasets remains a bottleneck due to the lack of a comprehensive reference database and a fast search method in a single tool. CellKb Immune is a knowledgebase of manually collected, curated and annotated marker gene sets from cell types in the mammalian immune response. It finds matching cell types in literature given a list of genes using a novel rank-based algorithm optimized for rapid searching across marker gene lists of differing lengths. We evaluated the contents and search algorithm of CellKb Immune using a leave-one-out approach. We further used CellKb Immune to annotate previously defined marker gene sets from Immgen to confirm its accuracy and coverage. CellKb Immune provides an easy to use database with a fast and reliable method to find matching cell types and annotate cells in single-cell experiments in a single tool. It is available at https://www.cellkb.com/immune.


Sign in / Sign up

Export Citation Format

Share Document