scholarly journals Large-scale labeling and assessment of sex bias in publicly available expression data

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Emily Flynn ◽  
Annie Chang ◽  
Russ B. Altman

Abstract Background Women are at more than 1.5-fold higher risk for clinically relevant adverse drug events. While this higher prevalence is partially due to gender-related effects, biological sex differences likely also impact drug response. Publicly available gene expression databases provide a unique opportunity for examining drug response at a cellular level. However, missingness and heterogeneity of metadata prevent large-scale identification of drug exposure studies and limit assessments of sex bias. To address this, we trained organism-specific models to infer sample sex from gene expression data, and used entity normalization to map metadata cell line and drug mentions to existing ontologies. Using this method, we inferred sex labels for 450,371 human and 245,107 mouse microarray and RNA-seq samples from refine.bio. Results Overall, we find slight female bias (52.1%) in human samples and (62.5%) male bias in mouse samples; this corresponds to a majority of mixed sex studies in humans and single sex studies in mice, split between female-only and male-only (25.8% vs. 18.9% in human and 21.6% vs. 31.1% in mouse, respectively). In drug studies, we find limited evidence for sex-sampling bias overall; however, specific categories of drugs, including human cancer and mouse nervous system drugs, are enriched in female-only and male-only studies, respectively. We leverage our expression-based sex labels to further examine the complexity of cell line sex and assess the frequency of metadata sex label misannotations (2–5%). Conclusions Our results demonstrate limited overall sex bias, while highlighting high bias in specific subfields and underscoring the importance of including sex labels to better understand the underlying biology. We make our inferred and normalized labels, along with flags for misannotated samples, publicly available to catalyze the routine use of sex as a study variable in future analyses.

2020 ◽  
Author(s):  
Emily Flynn ◽  
Annie Chang ◽  
Russ B. Altman

ABSTRACTWomen are at more than 1.5-fold higher risk for clinically relevant adverse drug events. While this higher prevalence is partially due to gender-related effects, biological sex differences likely also impact drug response. Publicly available gene expression databases provide a unique opportunity for examining drug response at a cellular level. However, missingness and heterogeneity of metadata prevent large-scale identification of drug exposure studies and limit assessments of sex bias. To address this, we trained organism-specific models to infer sample sex from gene expression data, and used entity normalization to map metadata cell line and drug mentions to existing ontologies. Using this method, we infer sex labels for 450,371 human and 245,107 mouse microarray and RNA-seq samples from refine.bio. Overall, we find slight female bias (52.1%) in human samples and (62.5%) male bias in mouse samples; this corresponds to a majority of single sex studies, split between female-only and male-only (33.3% vs 18.4% in human and 31.0% vs 30.4% in mouse respectively). In drug studies, we find limited evidence for sex-sampling bias overall; however, specific categories of drugs, including human cancer and mouse nervous system drugs, are enriched in female-only and male-only studies respectively. Our expression-based sex labels allow us to further examine the complexity of cell line sex and assess the frequency of metadata sex label misannotations (2-5%). We make our inferred and normalized labels, along with flags for misannotated samples, publicly available to catalyze the routine use of sex as a study variable in future analyses.


Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 772
Author(s):  
Seonghun Kim ◽  
Seockhun Bae ◽  
Yinhua Piao ◽  
Kyuri Jo

Genomic profiles of cancer patients such as gene expression have become a major source to predict responses to drugs in the era of personalized medicine. As large-scale drug screening data with cancer cell lines are available, a number of computational methods have been developed for drug response prediction. However, few methods incorporate both gene expression data and the biological network, which can harbor essential information about the underlying process of the drug response. We proposed an analysis framework called DrugGCN for prediction of Drug response using a Graph Convolutional Network (GCN). DrugGCN first generates a gene graph by combining a Protein-Protein Interaction (PPI) network and gene expression data with feature selection of drug-related genes, and the GCN model detects the local features such as subnetworks of genes that contribute to the drug response by localized filtering. We demonstrated the effectiveness of DrugGCN using biological data showing its high prediction accuracy among the competing methods.


Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 1623-1623 ◽  
Author(s):  
Karen Dybkær ◽  
Hanne Due ◽  
Rasmus Froberg Brøndum ◽  
Ken H. Young ◽  
Martin Bøgsted

Background: Patients with Diffuse large B-cell lymphoma (DLBCL) in approximately 40% of cases suffer from primary refractory disease and treatment induced immuno-chemotherapy resistance demonstrating that standard provided treatment regimens are not sufficient to cure all patients. Early detection of resistance is of great importance and defining microRNA (miRNA) involvement in resistance could be useful to guide treatment selection and help monitor treatment administration while sparing patients for inefficient, but still toxic therapy. Concept and Aims: With information on drug-response specific miRNAs, we hypothesized that multi-miRNA panels can improve robustness of individual clinical markers and serve as a prognostic classifier predicting disease progression in DLBCL patients. Methods: Fifteen DLBCL cell lines were tested for sensitivity towards rituximab (R), cyclophosphamide (C), doxorubicin (H), and vincristine (O). Cell line specific seeding concentrations was used to ensure exponential growth and each cell line was subjected to 16 concentrations in serial 2-fold dilutions and number of metabolic active cells was evaluated after 48 hours of drug exposure using MTS assay. For each drug, we ranked the cell lines according to their sensitivity and categorized them as sensitive, intermediate responsive, or resistant. Differential miRNA expression analysis between sensitive and resistant cell lines identified 43 miRNAs to be associated with response to compounds of the R-CHOP regimen, by selecting probes with a log fold change larger than 2. Baseline miRNA expression data were obtained for each cell line in untreated condition, and differential miRNA expression analysis identified 43 miRNAs associated to response to R-CHOP. Using the Affymetrix HG-U133+2 platform, expression levels of the miRNA precursors were assessed in 701 diagnostic DLBCL biopsies, and miRNA-panel classifiers were build using multiple Cox regression or random survival forest. Results: Generated prognostic miRNA-panel classifiers were tested for predictive accuracies and were subsequently evaluated by Brier scores and time varying area under the ROC curves (tAUC). Progression-free survival (PFS) was chosen as the outcome, since it is a treatment evaluation parameter as closely as possible to the time of drug exposure and the tested miRNAs were all associated directly to drug specific response. Furthermore, overall survival (OS) was used for verification of findings. Comparison of analyses conducted for the respective cohorts (All DLBCL, ABC, and GCB patients) showed the lowest prediction errors for all models within the GCB subclass with a multivariate Cox miRNA-panel model including miR-146a, miR-155, miR-21, miR-34a, and miR-23a~miR-27a~miR-24-2 cluster performed the best and successfully stratified GCB-DLBCL patients into high- and low-risk of disease progression. In addition, combination of the miRNA-panel and international prognostic index (IPI) substantially increased prognostic performance in GCB classified patients, indicating a prognostic signal from the response-specific miRNAs independent of IPI. In conclusion: We found as proof of concept that adding gene expression data detecting drug-response specific miRNAs to the clinically established IPI improved the prognostic stratification of GCB-DLBCL patients treated with R-CHOP. Disclosures No relevant conflicts of interest to declare.


2020 ◽  
Author(s):  
Benedict Hew ◽  
Qiao Wen Tan ◽  
William Goh ◽  
Jonathan Wei Xiong Ng ◽  
Kenny Koh ◽  
...  

AbstractBacterial resistance to antibiotics is a growing problem that is projected to cause more deaths than cancer in 2050. Consequently, novel antibiotics are urgently needed. Since more than half of the available antibiotics target the bacterial ribosomes, proteins that are involved in protein synthesis are thus prime targets for the development of novel antibiotics. However, experimental identification of these potential antibiotic target proteins can be labor-intensive and challenging, as these proteins are likely to be poorly characterized and specific to few bacteria. In order to identify these novel proteins, we established a Large-Scale Transcriptomic Analysis Pipeline in Crowd (LSTrAP-Crowd), where 285 individuals processed 26 terabytes of RNA-sequencing data of the 17 most notorious bacterial pathogens. In total, the crowd processed 26,269 RNA-seq experiments and used the data to construct gene co-expression networks, which were used to identify more than a hundred uncharacterized genes that were transcriptionally associated with protein synthesis. We provide the identity of these genes together with the processed gene expression data. The data can be used to identify other vulnerabilities or bacteria, while our approach demonstrates how the processing of gene expression data can be easily crowdsourced.


2020 ◽  
Author(s):  
Minsheng Hao ◽  
Kui Hua ◽  
Xuegong Zhang

AbstractRecent developments of spatial transcriptomic sequencing technologies provide powerful tools for understanding cells in the physical context of tissue micro-environments. A fundamental task in spatial gene expression analysis is to identify genes with spatially variable expression patterns, or spatially variable genes (SVgenes). Several computational methods have been developed for this task. Their high computational complexity limited their scalability to the latest and future large-scale spatial expression data.We present SOMDE, an efficient method for identifying SVgenes in large-scale spatial expression data. SOMDE uses selforganizing map (SOM) to cluster neighboring cells into nodes, and then uses a Gaussian Process to fit the node-level spatial gene expression to identify SVgenes. Experiments show that SOMDE is about 5-50 times faster than existing methods with comparable results. The adjustable resolution of SOMDE makes it the only method that can give results in ~5 minutes in large datasets of more than 20,000 sequencing sites. SOMDE is available as a python package on PyPI at https://pypi.org/project/somde.


2004 ◽  
Vol 20 (13) ◽  
pp. 1993-2003 ◽  
Author(s):  
J. Ihmels ◽  
S. Bergmann ◽  
N. Barkai

Sign in / Sign up

Export Citation Format

Share Document