The effects of a globin blocker on the resolution of 3’mRNA sequencing data in porcine blood

Abstract Background Gene expression profiling in blood is a potential source of biomarkers to evaluate or predict phenotypic differences between pigs but is expensive and inefficient because of the high abundance of globin mRNA in porcine blood. These limitations can be overcome by the use of QuantSeq 3’mRNA sequencing (QuantSeq) combined with a method to deplete or block the processing of globin mRNA prior to or during library construction. Here, we validated the effectiveness of QuantSeq using a novel specific globin blocker (GB) that is included in the library preparation step of QuantSeq. Results In data set 1, four concentrations of the GB were applied to RNA samples from two pigs. The GB significantly reduced the proportion of globin reads compared to non-GB (NGB) samples (P = 0.005) and increased the number of detectable non-globin genes. The highest evaluated concentration (C1) of the GB resulted in the largest reduction of globin reads compared to the NGB (from 56.4 to 10.1%). The second highest concentration C2, which showed very similar globin depletion rates (12%) as C1 but a better correlation of the expression of non-globin genes between NGB and GB (r = 0.98), allowed the expression of an additional 1295 non-globin genes to be detected, although 40 genes that were detected in the NGB sample (at a low level) were not present in the GB library. Concentration C2 was applied in the rest of the study. In data set 2, the distribution of the percentage of globin reads for NGB (n = 184) and GB (n = 189) samples clearly showed the effects of the GB on reducing globin reads, in particular for HBB, similar to results from data set 1. Data set 3 (n = 84) revealed that the proportion of globin reads that remained in GB samples was significantly and positively correlated with the reticulocyte count in the original blood sample (P < 0.001). Conclusions The effect of the GB on reducing the proportion of globin reads in porcine blood QuantSeq was demonstrated in three data sets. In addition to increasing the efficiency of sequencing non-globin mRNA, the GB for QuantSeq has an advantage that it does not require an additional step prior to or during library creation. Therefore, the GB is a useful tool in the quantification of whole gene expression profiles in porcine blood.

Download Full-text

The effects of a globin blocker on the resolution of 3’mRNA sequencing data in porcine blood

10.21203/rs.2.9873/v1 ◽

2019 ◽

Author(s):

Kyu-Sang Lim ◽

Qian Dong ◽

Pamela Renate Moll ◽

Jana Vitkovska ◽

Gregor Wiktorin ◽

...

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Data Sets ◽

Sequencing Data ◽

Data Set ◽

Phenotypic Differences ◽

Mrna Sequencing ◽

Porcine Blood ◽

Preparation Step

Abstract Background Gene expression profiling in blood is a potential source of biomarkers to evaluate or predict phenotypic differences between pigs but is expensive and inefficient because of the high abundance of hemoglobin (HB) mRNA in porcine blood. These limitations can be overcome by the use of QuantSeq 3’mRNA sequencing (QuantSeq) combined with a method to deplete or block the processing of HB mRNA prior to or during library construction. Here, we validated the effectiveness of QuantSeq using a novel specific globin blocker (GB) that is included in the library preparation step of QuantSeq. Results In data set 1, four concentrations of the GB were applied to RNA samples from two pigs. The GB significantly reduced the proportion of HB reads compared to non-GB (NGB) samples (P = 0.005) and increased the number of detectable non-HB genes. The second highest concentration C2, which showed very similar globin depletion rates (from 56.4 to 12%) as C1 but a better correlation of the expression of non-HB genes between NGB and GB (r = 0.98), allowed the expression of an additional 1,295 non-HB genes to be detected, although 40 genes that were detected in the NGB sample (at a low level) were not present in the GB library. Concentration C2 was applied in the rest of the study. In data set 2, the distribution of the percentage of HB reads for NGB (n=184) and GB (n=189) samples clearly showed the effects of the GB on reducing HB reads. Data set 3 (n=84) revealed that the proportion of HB reads that remained in GB samples was significantly and positively correlated with the reticulocyte count in the original blood sample (P < 0.001). Conclusions The effect of the GB on reducing the proportion of HB reads in porcine blood QuantSeq was demonstrated in three data sets. In addition to increasing the efficiency of sequencing non-HB mRNA, the GB for QuantSeq has as advantage that it does not require an additional step prior to or during library creation. Therefore, the GB is a useful tool in the quantification of whole gene expression profiles in porcine blood.

Download Full-text

The effects of a globin blocker on the resolution of 3’mRNA sequencing data in porcine blood

10.21203/rs.2.9873/v2 ◽

2019 ◽

Author(s):

Kyu-Sang Lim ◽

Qian Dong ◽

Pamela Renate Moll ◽

Jana Vitkovska ◽

Gregor Wiktorin ◽

...

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Globin Genes ◽

Sequencing Data ◽

Globin Mrna ◽

Additional Advantage ◽

Phenotypic Differences ◽

Mrna Sequencing ◽

Porcine Blood

Abstract Background : Gene expression profiling in blood is a potential source of biomarkers to evaluate or predict phenotypic differences between pigs but is expensive and inefficient because of the high abundance of globin mRNA in porcine blood. These limitations can be overcome by the use of 3’mRNA sequencing combined with a method to deplete or block the processing of globin mRNA prior to or during library construction. Here, we validated the effectiveness of a novel specific globin blocker (GB) that is included in the library preparation step of 3’mRNA sequencing. Results : Four concentrations of the GB were applied to RNA samples from two pigs. The GB significantly reduced the proportion of globin reads ( P = 0.005) and increased the number of detectable non-globin genes. The highest evaluated concentration (C1) of the GB resulted in the largest reduction of globin reads (from 56.4 to 10.1%). The second highest concentration C2, showed very similar globin depletion rates (12 %) as C1 but a better correlation of the expression of non-globin genes between GB and non-GB ( r = 0.98), and allowed the expression of an additional 1,295 non-globin genes to be detected. Concentration C2 was applied in the rest of the study. The distribution of the percentage of globin reads for non-GB (n=184) and GB (n=189) samples clearly showed the effects of the GB on reducing globin reads, in particular for HBB . The proportion of globin reads that remained in GB samples was found to be positively correlated with reticulocyte count of the blood sample ( P < 0.001). Conclusions : The GB for 3’mRNA sequencing is a useful tool in the quantification of whole gene expression profiles in porcine blood. The GB reduced the proportion of globin reads, thereby increasing the efficiency of sequencing non-globin mRNA. The evaluated GB method has as additional advantage that it does not require an additional step prior to or during library creation.

Download Full-text

Artificial Neural Networks for classification of single cell gene expression

10.1101/2021.07.29.454293 ◽

2021 ◽

Author(s):

Jiahui Zhong ◽

Minjie Lyu ◽

Huan Jin ◽

Zhiwei Cao ◽

Lou T Chitkushev ◽

...

Keyword(s):

Gene Expression ◽

Classification Accuracy ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Data Sets ◽

Sample Processing ◽

Data Set ◽

Cell Gene Expression ◽

Cell Gene

Background: Single-cell transcriptome (SCT) sequencing technology has reached the level of high-throughput technology where gene expression can be measured concurrently from large numbers of cells. The results of gene expression studies are highly reproducible when strict protocols and standard operating procedures (SOP) are followed. However, differences in sample processing conditions result in significant changes in gene expression profiles making direct comparison of different studies difficult. Unsupervised machine learning (ML) uses clustering algorithms combined with semi-automated cell labeling and manual annotation of individual cells. They do not scale up well and a workflow used on a specific dataset will not perform well with other studies. Supervised ML classification shows superior classification accuracy and generalization properties as compared to unsupervised ML methods. We describe a supervised ML method that deploys artificial neural networks (ANN), for 5-class classification of healthy peripheral blood mononuclear cells (PBMC) from multiple diverse studies. Results: We used 58 data sets to train ANN incrementally - over ten cycles of training and testing. The sample processing involved four protocols: separation of PBMC, separation of PBMC + enrichment (by negative selection), separation of PBMC + FACS, and separation of PBMC + MACS. The training data set included between 85 and 110 thousand cells, and the test set had approximately 13 thousand cells. Training and testing were done with various combinations of data sets from four principal data sources. The overall accuracy of classification on independent data sets reached 5-class classification accuracy of 94%. Classification accuracy for B cells, monocytes, and T cells exceeded 95%. Classification accuracy of natural killer (NK) cells was 75% because of the similarity between NK cells and T cell subsets. The accuracy of dendritic cells (DC) was low due to very low numbers of DC in the training sets. Conclusions: The incremental learning ANN model can accurately classify the main types of PBMC. With the inclusion of more DC and resolving ambiguities between T cell and NK cell gene expression profiles, we will enable high accuracy supervised ML classification of PBMC. We assembled a reference data set for healthy PBMC and demonstrated a proof-of-concept for supervised ANN method in classification of previously unseen SCT data. The classification shows high accuracy, that is consistent across different studies and sample processing methods.

Download Full-text

LncGSEA: a versatile tool to infer lncRNA associated pathways from large-scale cancer transcriptome sequencing data

BMC Genomics ◽

10.1186/s12864-021-07900-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yanan Ren ◽

Ting-You Wang ◽

Leah C. Anderton ◽

Qi Cao ◽

Rendong Yang

Keyword(s):

Gene Expression ◽

Large Scale ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Clinical Samples ◽

Sequencing Data ◽

Multiple Cancer ◽

Regulatory Pathways ◽

Cancer Transcriptome ◽

Versatile Tool

Abstract Background Long non-coding RNAs (lncRNAs) are a growing focus in cancer research. Deciphering pathways influenced by lncRNAs is important to understand their role in cancer. Although knock-down or overexpression of lncRNAs followed by gene expression profiling in cancer cell lines are established approaches to address this problem, these experimental data are not available for a majority of the annotated lncRNAs. Results As a surrogate, we present lncGSEA, a convenient tool to predict the lncRNA associated pathways through Gene Set Enrichment Analysis of gene expression profiles from large-scale cancer patient samples. We demonstrate that lncGSEA is able to recapitulate lncRNA associated pathways supported by literature and experimental validations in multiple cancer types. Conclusions LncGSEA allows researchers to infer lncRNA regulatory pathways directly from clinical samples in oncology. LncGSEA is written in R, and is freely accessible at https://github.com/ylab-hi/lncGSEA.

Download Full-text

Improved Feature Selection by Incorporating Gene Similarity into the LASSO

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/jkdb.2012010101 ◽

2012 ◽

Vol 3 (1) ◽

pp. 1-22 ◽

Cited By ~ 1

Author(s):

Christopher E. Gillies ◽

Xiaoli Gao ◽

Nilesh V. Patel ◽

Mohammad-Reza Siadat ◽

George D. Wilson

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Personalized Medicine ◽

Objective Function ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Genetic Profile ◽

Data Set ◽

Coordinate Descent Algorithm ◽

Gene Similarity

Personalized medicine is customizing treatments to a patient’s genetic profile and has the potential to revolutionize medical practice. An important process used in personalized medicine is gene expression profiling. Analyzing gene expression profiles is difficult, because there are usually few patients and thousands of genes, leading to the curse of dimensionality. To combat this problem, researchers suggest using prior knowledge to enhance feature selection for supervised learning algorithms. The authors propose an enhancement to the LASSO, a shrinkage and selection technique that induces parameter sparsity by penalizing a model’s objective function. Their enhancement gives preference to the selection of genes that are involved in similar biological processes. The authors’ modified LASSO selects similar genes by penalizing interaction terms between genes. They devise a coordinate descent algorithm to minimize the corresponding objective function. To evaluate their method, the authors created simulation data where they compared their model to the standard LASSO model and an interaction LASSO model. The authors’ model outperformed both the standard and interaction LASSO models in terms of detecting important genes and gene interactions for a reasonable number of training samples. They also demonstrated the performance of their method on a real gene expression data set from lung cancer cell lines.

Download Full-text

Combining Nearest Neighbor Classifiers Versus Cross-Validation Selection

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1054 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-19 ◽

Cited By ~ 8

Author(s):

Minhui Paik ◽

Yuhong Yang

Keyword(s):

Gene Expression ◽

Cross Validation ◽

Nearest Neighbor ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Data Sets ◽

Weighting Method ◽

Considerable Uncertainty ◽

Combined Classifier ◽

Nearest Neighbor Classifiers

Various discriminant methods have been applied for classification of tumors based on gene expression profiles, among which the nearest neighbor (NN) method has been reported to perform relatively well. Usually cross-validation (CV) is used to select the neighbor size as well as the number of variables for the NN method. However, CV can perform poorly when there is considerable uncertainty in choosing the best candidate classifier. As an alternative to selecting a single “winner," we propose a weighting method to combine the multiple NN rules. Four gene expression data sets are used to compare its performance with CV methods. The results show that when the CV selection is unstable, the combined classifier performs much better.

Download Full-text

PCA-based unsupervised feature extraction for gene expression analysis of COVID-19 patients

Scientific Reports ◽

10.1038/s41598-021-95698-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Kota Fujisawa ◽

Mamoru Shimo ◽

Y.-H. Taguchi ◽

Shinya Ikematsu ◽

Ryota Miyata

Keyword(s):

Gene Expression ◽

Feature Extraction ◽

Target Genes ◽

Gene Selection ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Principal Component ◽

Data Set ◽

Immune Related Genes ◽

Unsupervised Feature Extraction

AbstractCoronavirus disease 2019 (COVID-19) is raging worldwide. This potentially fatal infectious disease is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, the complete mechanism of COVID-19 is not well understood. Therefore, we analyzed gene expression profiles of COVID-19 patients to identify disease-related genes through an innovative machine learning method that enables a data-driven strategy for gene selection from a data set with a small number of samples and many candidates. Principal-component-analysis-based unsupervised feature extraction (PCAUFE) was applied to the RNA expression profiles of 16 COVID-19 patients and 18 healthy control subjects. The results identified 123 genes as critical for COVID-19 progression from 60,683 candidate probes, including immune-related genes. The 123 genes were enriched in binding sites for transcription factors NFKB1 and RELA, which are involved in various biological phenomena such as immune response and cell survival: the primary mediator of canonical nuclear factor-kappa B (NF-κB) activity is the heterodimer RelA-p50. The genes were also enriched in histone modification H3K36me3, and they largely overlapped the target genes of NFKB1 and RELA. We found that the overlapping genes were downregulated in COVID-19 patients. These results suggest that canonical NF-κB activity was suppressed by H3K36me3 in COVID-19 patient blood.

Download Full-text

Characterization of Whole Blood Gene Expression Profiles as a Sequel to Globin mRNA Reduction in Patients with Sickle Cell Disease

PLoS ONE ◽

10.1371/journal.pone.0006484 ◽

2009 ◽

Vol 4 (8) ◽

pp. e6484 ◽

Cited By ~ 29

Author(s):

Nalini Raghavachari ◽

Xiuli Xu ◽

Peter J. Munson ◽

Mark T. Gladwin

Keyword(s):

Gene Expression ◽

Sickle Cell Disease ◽

Sickle Cell ◽

Whole Blood ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cell Disease ◽

Globin Mrna ◽

Blood Gene Expression

Download Full-text

Gene Expression Profiles Involved in γ to β Globin Gene Switching during Erythroid Maturation.

Blood ◽

10.1182/blood.v106.11.820.820 ◽

2005 ◽

Vol 106 (11) ◽

pp. 820-820

Author(s):

Wei Li ◽

Betty S. Pace

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Globin Gene ◽

Gene Expression Profiles ◽

Phase System ◽

Mrna Levels ◽

Erythroid Progenitors ◽

Globin Mrna ◽

Gene Switching ◽

Globin Gene Expression

Abstract The design and evaluation of therapies for sickle cell disease (SCD) rely on our understanding of hemoglobin accumulation during erythropoiesis and sequential globin gene expression (ε → Gγ → Aγ → δ → β) during development. To gain insights into globin gene switching, we completed time course micorarray analyses of erythroid progenitors to identify trans-factors involved in γ gene activation. Studies were completed to map the pattern of γ and β globin gene expression in progenitors grown from normal peripheral blood mononuclear cells. We compared cells grown in a 2-phase (phase 1, d0-6: SCF, IL-3, IL-6, and GM-CSF and phase 2, d7-25: SCF and EPO) vs. 1-phase (d0-34: SCF, IL-3, and EPO) liquid culture system. From day 0 to 34 in either system cell viability remained >99%. Total RNA was isolated using Trizol and column cleanup (Qiagen). Globin mRNA levels were measured at 2–3 day intervals by quantitative PCR (qPCR). In the 2-phase system γ-globin mRNA>β-globin mRNA up to d14, 4 days of approximately equal expression then β mRNA > γ mRNA by d20. By contrast, in 1-phase studies there was a rapid switch around d20(see graph). We speculate that this difference may be due to the early addition of EPO on d0 therefore we continued our detailed analysis in this system. To confirm that our in vitro system recapitulates in vivo gene expression patterns, we completed studies to ascertain Gγ - vs. Aγ globin mRNA levels. The normalized Gγ:Aγ ratio decreased from ~3:1 on d7 to ~1:1 by d34; These findings were confirmed using two sets of Gγ and Aγ globin primers. We concluded that the 1-phase system recapitulated normal γ/β globin switching and that gene profiling studies to identify the trans-factor involved in switching mechanisms were feasible. We used Discover oligo chips (ArrayIt, Sunnyvale, CA) containing 380 human genes selected from 30 major functional groups including hematopoiesis. To aide interpretation of chip data, cell populations were rated morphologically using Giemsa stained cytospin preps. From d16 on we observed an increase in late erythroid progenitors (normoblasts) from 1% to 71% by d31. After verifying RNA quality by gel inspection of ribosomal molecules, we prepared Cy3 and Cy5 probes for early and late time-point RNA samples respectively. Chip analysis was performed at several time points but d0/21, d7/21, and d21/28 were most informative. Based on Axon GenePixPro 6.0 and Acuity 4.0 software analysis we found the following genes with >1.5-fold change in expression profile (shown as down-regulated/up-regulated genes): d0/21: 33/73, d7/21: 13/25, and d21/28:35/26. Principal component analysis (PCA), hierarchical clusters and self organizing maps were constructed. Gene profiles were correlated with the γ/β switching curve using d7 (γ >β), d21 (γ ~ β), and d28 (γ <β) data. Hematopoietic dataset analysis at d21 revealed 4 candidate γ-globin gene activators including v-myb, upsteam binding transfactor -RNApol1 and 2 zinc finger proteins. Analysis of a d28 dataset revealed 12 proteins involved in γ-globin gene silencing including IL-3, SCF, MAPKKK3, v-raf-1, ATF-2, and glucocorticoid receptor DNA binding factor 1 among others. Gene expression profiles will be validated using qPCR and promising candidates will be tested by forced expression in transient and stable reporter systems. Figure Figure

Download Full-text

Prediction of Cytogenetic Abnormalities in Multiple Myeloma Based on Gene Expression Profiles

Blood ◽

10.1182/blood.v118.21.629.629 ◽

2011 ◽

Vol 118 (21) ◽

pp. 629-629

Author(s):

Yiming Zhou ◽

Qing Zhang ◽

Christoph Heuck ◽

Owen Stephens ◽

Erming Tian ◽

...

Keyword(s):

Gene Expression ◽

Research Funding ◽

Prediction Accuracy ◽

Plasma Cells ◽

Chromosome Region ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Data Set ◽

Expression Levels ◽

The Mean

Abstract Abstract 629 Background: Cytogenetic abnormalities (CA) are a hallmark of multiple myeloma (MM) and other cancers and are commonly used as clinical parameters for determining disease stage and guiding therapy decisions. Traditional techniques, including fluorescence in situ hybridization (FISH) and karyotyping, and the recently developed array-based comparative genomic hybridization are expensive and time consuming. As gene expression profiling (GEP) is becoming more integrated in the diagnostic workup of MM and is increasingly being used for risk stratification as well as tailoring therapy, we are presented with vast amounts of data that should reflect disease associated alterations of the genome. We therefore sought to develop a GEP based vitual CA (vCA) model to predict CA in MM. Methods/Results: We determined genome-wide gene expression profiles and DNA copy numbers (CNs) in purified plasma cell samples obtained from 92 newly diagnosed MM patients, using the Affymetrix GeneChip and the Agilent aCGH platforms, respectively. We identified 1,114 CN-sensitive genes by Pearson's correlation coefficient (PCC) of gene expression levels and the copy numbers of the corresponding DNA loci, keeping the false discovery rate to <5%. On the basis of these CN-sensitive genes, we developed a vCA model for predicting CA in MM patients by means of GEP. The model focuses particularly on chromosomes 3, 5, 7, 9, 11, 13, 15, 19, and 21, as well as the 1p, 1q, and 6q segments, which are the most commonly altered chromosome regions in MM plasma cells. The reference CA (rCA) of a given chromosome region were determined by the mean values of signals of aCGH probes located in that region. The values of rCA could be used to distinguish among amplification, deletion, and normal. The predicted CA (pCA) of a given chromosome region were determined by the following procedures. First, we calculated the mean expression levels of CN-sensitive genes within the region. Then, by training the model in a GEP data set with 92 MM samples, we set the cutoff value of the mean expression levels of CN-sensitive genes for each chromosome region in order to obtain pCA that were most consistent with rCA in terms of the Matthews correlation coefficient, a measure of the quality of binary (two-class) classifications. The mean prediction accuracy was 0.88 (0.59–0.99) when the model was applied to the training data set. To check for overfitting in the vCA model, we applied the model to an independent data set of 23 MM samples for which both GEP and aCGH data were available. The mean prediction accuracy was 0.89 (0.74–1.00), which indicated that overfitting was negligible if present at all. We further validated the model with a FISH data set compiled from 262 independent MM samples for which both FISH records and GEP data were available. The mean prediction accuracy was 0.87. The consistency between vCA-predicted chromosomal alterations and findings of karyotyping dropped to 0.65. However, this underperformance could be due to the fact that karyotyping is limited by the low proliferation rate of terminally differentiated plasma cells in vitro. Conclusion: Our results provide a proof of concept that GEP data alone can reveal all the information provided by conventional cytogenetic techniques. We show that re-purposing gene expression data using our model is a fast and economical way to obtain cytogenetic information that is accurate and can be used for diagnosis and observation in MM and potentially other malignancies. GEP can serve as a one-stop genomic data source for information from the level of specific genes to whole chromosomes. Disclosures: Barlogie: Celgene: Consultancy, Honoraria, Research Funding; IMF: Consultancy, Honoraria; MMRF: Consultancy; Millennium: Consultancy, Honoraria, Research Funding; Genzyme: Consultancy; Novartis: Research Funding; NCI: Research Funding; Johnson & Johnson: Research Funding; Centocor: Research Funding; Onyx: Research Funding; Icon: Research Funding. Shaughnessy:Myeloma Health, Celgene, Genzyme, Novartis: Consultancy, Employment, Equity Ownership, Honoraria, Patents & Royalties.

Download Full-text