scholarly journals The effects of a globin blocker on the resolution of 3’mRNA sequencing data in porcine blood

2019 ◽  
Author(s):  
Kyu-Sang Lim ◽  
Qian Dong ◽  
Pamela Renate Moll ◽  
Jana Vitkovska ◽  
Gregor Wiktorin ◽  
...  

Abstract Background Gene expression profiling in blood is a potential source of biomarkers to evaluate or predict phenotypic differences between pigs but is expensive and inefficient because of the high abundance of hemoglobin (HB) mRNA in porcine blood. These limitations can be overcome by the use of QuantSeq 3’mRNA sequencing (QuantSeq) combined with a method to deplete or block the processing of HB mRNA prior to or during library construction. Here, we validated the effectiveness of QuantSeq using a novel specific globin blocker (GB) that is included in the library preparation step of QuantSeq. Results In data set 1, four concentrations of the GB were applied to RNA samples from two pigs. The GB significantly reduced the proportion of HB reads compared to non-GB (NGB) samples (P = 0.005) and increased the number of detectable non-HB genes. The second highest concentration C2, which showed very similar globin depletion rates (from 56.4 to 12%) as C1 but a better correlation of the expression of non-HB genes between NGB and GB (r = 0.98), allowed the expression of an additional 1,295 non-HB genes to be detected, although 40 genes that were detected in the NGB sample (at a low level) were not present in the GB library. Concentration C2 was applied in the rest of the study. In data set 2, the distribution of the percentage of HB reads for NGB (n=184) and GB (n=189) samples clearly showed the effects of the GB on reducing HB reads. Data set 3 (n=84) revealed that the proportion of HB reads that remained in GB samples was significantly and positively correlated with the reticulocyte count in the original blood sample (P < 0.001). Conclusions The effect of the GB on reducing the proportion of HB reads in porcine blood QuantSeq was demonstrated in three data sets. In addition to increasing the efficiency of sequencing non-HB mRNA, the GB for QuantSeq has as advantage that it does not require an additional step prior to or during library creation. Therefore, the GB is a useful tool in the quantification of whole gene expression profiles in porcine blood.

BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Kyu-Sang Lim ◽  
Qian Dong ◽  
Pamela Moll ◽  
Jana Vitkovska ◽  
Gregor Wiktorin ◽  
...  

Abstract Background Gene expression profiling in blood is a potential source of biomarkers to evaluate or predict phenotypic differences between pigs but is expensive and inefficient because of the high abundance of globin mRNA in porcine blood. These limitations can be overcome by the use of QuantSeq 3’mRNA sequencing (QuantSeq) combined with a method to deplete or block the processing of globin mRNA prior to or during library construction. Here, we validated the effectiveness of QuantSeq using a novel specific globin blocker (GB) that is included in the library preparation step of QuantSeq. Results In data set 1, four concentrations of the GB were applied to RNA samples from two pigs. The GB significantly reduced the proportion of globin reads compared to non-GB (NGB) samples (P = 0.005) and increased the number of detectable non-globin genes. The highest evaluated concentration (C1) of the GB resulted in the largest reduction of globin reads compared to the NGB (from 56.4 to 10.1%). The second highest concentration C2, which showed very similar globin depletion rates (12%) as C1 but a better correlation of the expression of non-globin genes between NGB and GB (r = 0.98), allowed the expression of an additional 1295 non-globin genes to be detected, although 40 genes that were detected in the NGB sample (at a low level) were not present in the GB library. Concentration C2 was applied in the rest of the study. In data set 2, the distribution of the percentage of globin reads for NGB (n = 184) and GB (n = 189) samples clearly showed the effects of the GB on reducing globin reads, in particular for HBB, similar to results from data set 1. Data set 3 (n = 84) revealed that the proportion of globin reads that remained in GB samples was significantly and positively correlated with the reticulocyte count in the original blood sample (P < 0.001). Conclusions The effect of the GB on reducing the proportion of globin reads in porcine blood QuantSeq was demonstrated in three data sets. In addition to increasing the efficiency of sequencing non-globin mRNA, the GB for QuantSeq has an advantage that it does not require an additional step prior to or during library creation. Therefore, the GB is a useful tool in the quantification of whole gene expression profiles in porcine blood.


2019 ◽  
Author(s):  
Kyu-Sang Lim ◽  
Qian Dong ◽  
Pamela Renate Moll ◽  
Jana Vitkovska ◽  
Gregor Wiktorin ◽  
...  

Abstract Background : Gene expression profiling in blood is a potential source of biomarkers to evaluate or predict phenotypic differences between pigs but is expensive and inefficient because of the high abundance of globin mRNA in porcine blood. These limitations can be overcome by the use of 3’mRNA sequencing combined with a method to deplete or block the processing of globin mRNA prior to or during library construction. Here, we validated the effectiveness of a novel specific globin blocker (GB) that is included in the library preparation step of 3’mRNA sequencing. Results : Four concentrations of the GB were applied to RNA samples from two pigs. The GB significantly reduced the proportion of globin reads ( P = 0.005) and increased the number of detectable non-globin genes. The highest evaluated concentration (C1) of the GB resulted in the largest reduction of globin reads (from 56.4 to 10.1%). The second highest concentration C2, showed very similar globin depletion rates (12 %) as C1 but a better correlation of the expression of non-globin genes between GB and non-GB ( r = 0.98), and allowed the expression of an additional 1,295 non-globin genes to be detected. Concentration C2 was applied in the rest of the study. The distribution of the percentage of globin reads for non-GB (n=184) and GB (n=189) samples clearly showed the effects of the GB on reducing globin reads, in particular for HBB . The proportion of globin reads that remained in GB samples was found to be positively correlated with reticulocyte count of the blood sample ( P < 0.001). Conclusions : The GB for 3’mRNA sequencing is a useful tool in the quantification of whole gene expression profiles in porcine blood. The GB reduced the proportion of globin reads, thereby increasing the efficiency of sequencing non-globin mRNA. The evaluated GB method has as additional advantage that it does not require an additional step prior to or during library creation.


2021 ◽  
Author(s):  
Jiahui Zhong ◽  
Minjie Lyu ◽  
Huan Jin ◽  
Zhiwei Cao ◽  
Lou T Chitkushev ◽  
...  

Background: Single-cell transcriptome (SCT) sequencing technology has reached the level of high-throughput technology where gene expression can be measured concurrently from large numbers of cells. The results of gene expression studies are highly reproducible when strict protocols and standard operating procedures (SOP) are followed. However, differences in sample processing conditions result in significant changes in gene expression profiles making direct comparison of different studies difficult. Unsupervised machine learning (ML) uses clustering algorithms combined with semi-automated cell labeling and manual annotation of individual cells. They do not scale up well and a workflow used on a specific dataset will not perform well with other studies. Supervised ML classification shows superior classification accuracy and generalization properties as compared to unsupervised ML methods. We describe a supervised ML method that deploys artificial neural networks (ANN), for 5-class classification of healthy peripheral blood mononuclear cells (PBMC) from multiple diverse studies. Results: We used 58 data sets to train ANN incrementally - over ten cycles of training and testing. The sample processing involved four protocols: separation of PBMC, separation of PBMC + enrichment (by negative selection), separation of PBMC + FACS, and separation of PBMC + MACS. The training data set included between 85 and 110 thousand cells, and the test set had approximately 13 thousand cells. Training and testing were done with various combinations of data sets from four principal data sources. The overall accuracy of classification on independent data sets reached 5-class classification accuracy of 94%. Classification accuracy for B cells, monocytes, and T cells exceeded 95%. Classification accuracy of natural killer (NK) cells was 75% because of the similarity between NK cells and T cell subsets. The accuracy of dendritic cells (DC) was low due to very low numbers of DC in the training sets. Conclusions: The incremental learning ANN model can accurately classify the main types of PBMC. With the inclusion of more DC and resolving ambiguities between T cell and NK cell gene expression profiles, we will enable high accuracy supervised ML classification of PBMC. We assembled a reference data set for healthy PBMC and demonstrated a proof-of-concept for supervised ANN method in classification of previously unseen SCT data. The classification shows high accuracy, that is consistent across different studies and sample processing methods.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yanan Ren ◽  
Ting-You Wang ◽  
Leah C. Anderton ◽  
Qi Cao ◽  
Rendong Yang

Abstract Background Long non-coding RNAs (lncRNAs) are a growing focus in cancer research. Deciphering pathways influenced by lncRNAs is important to understand their role in cancer. Although knock-down or overexpression of lncRNAs followed by gene expression profiling in cancer cell lines are established approaches to address this problem, these experimental data are not available for a majority of the annotated lncRNAs. Results As a surrogate, we present lncGSEA, a convenient tool to predict the lncRNA associated pathways through Gene Set Enrichment Analysis of gene expression profiles from large-scale cancer patient samples. We demonstrate that lncGSEA is able to recapitulate lncRNA associated pathways supported by literature and experimental validations in multiple cancer types. Conclusions LncGSEA allows researchers to infer lncRNA regulatory pathways directly from clinical samples in oncology. LncGSEA is written in R, and is freely accessible at https://github.com/ylab-hi/lncGSEA.


Author(s):  
Christopher E. Gillies ◽  
Xiaoli Gao ◽  
Nilesh V. Patel ◽  
Mohammad-Reza Siadat ◽  
George D. Wilson

Personalized medicine is customizing treatments to a patient’s genetic profile and has the potential to revolutionize medical practice. An important process used in personalized medicine is gene expression profiling. Analyzing gene expression profiles is difficult, because there are usually few patients and thousands of genes, leading to the curse of dimensionality. To combat this problem, researchers suggest using prior knowledge to enhance feature selection for supervised learning algorithms. The authors propose an enhancement to the LASSO, a shrinkage and selection technique that induces parameter sparsity by penalizing a model’s objective function. Their enhancement gives preference to the selection of genes that are involved in similar biological processes. The authors’ modified LASSO selects similar genes by penalizing interaction terms between genes. They devise a coordinate descent algorithm to minimize the corresponding objective function. To evaluate their method, the authors created simulation data where they compared their model to the standard LASSO model and an interaction LASSO model. The authors’ model outperformed both the standard and interaction LASSO models in terms of detecting important genes and gene interactions for a reasonable number of training samples. They also demonstrated the performance of their method on a real gene expression data set from lung cancer cell lines.


2004 ◽  
Vol 3 (1) ◽  
pp. 1-19 ◽  
Author(s):  
Minhui Paik ◽  
Yuhong Yang

Various discriminant methods have been applied for classification of tumors based on gene expression profiles, among which the nearest neighbor (NN) method has been reported to perform relatively well. Usually cross-validation (CV) is used to select the neighbor size as well as the number of variables for the NN method. However, CV can perform poorly when there is considerable uncertainty in choosing the best candidate classifier. As an alternative to selecting a single “winner," we propose a weighting method to combine the multiple NN rules. Four gene expression data sets are used to compare its performance with CV methods. The results show that when the CV selection is unstable, the combined classifier performs much better.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kota Fujisawa ◽  
Mamoru Shimo ◽  
Y.-H. Taguchi ◽  
Shinya Ikematsu ◽  
Ryota Miyata

AbstractCoronavirus disease 2019 (COVID-19) is raging worldwide. This potentially fatal infectious disease is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, the complete mechanism of COVID-19 is not well understood. Therefore, we analyzed gene expression profiles of COVID-19 patients to identify disease-related genes through an innovative machine learning method that enables a data-driven strategy for gene selection from a data set with a small number of samples and many candidates. Principal-component-analysis-based unsupervised feature extraction (PCAUFE) was applied to the RNA expression profiles of 16 COVID-19 patients and 18 healthy control subjects. The results identified 123 genes as critical for COVID-19 progression from 60,683 candidate probes, including immune-related genes. The 123 genes were enriched in binding sites for transcription factors NFKB1 and RELA, which are involved in various biological phenomena such as immune response and cell survival: the primary mediator of canonical nuclear factor-kappa B (NF-κB) activity is the heterodimer RelA-p50. The genes were also enriched in histone modification H3K36me3, and they largely overlapped the target genes of NFKB1 and RELA. We found that the overlapping genes were downregulated in COVID-19 patients. These results suggest that canonical NF-κB activity was suppressed by H3K36me3 in COVID-19 patient blood.


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 629-629
Author(s):  
Yiming Zhou ◽  
Qing Zhang ◽  
Christoph Heuck ◽  
Owen Stephens ◽  
Erming Tian ◽  
...  

Abstract Abstract 629 Background: Cytogenetic abnormalities (CA) are a hallmark of multiple myeloma (MM) and other cancers and are commonly used as clinical parameters for determining disease stage and guiding therapy decisions. Traditional techniques, including fluorescence in situ hybridization (FISH) and karyotyping, and the recently developed array-based comparative genomic hybridization are expensive and time consuming. As gene expression profiling (GEP) is becoming more integrated in the diagnostic workup of MM and is increasingly being used for risk stratification as well as tailoring therapy, we are presented with vast amounts of data that should reflect disease associated alterations of the genome. We therefore sought to develop a GEP based vitual CA (vCA) model to predict CA in MM. Methods/Results: We determined genome-wide gene expression profiles and DNA copy numbers (CNs) in purified plasma cell samples obtained from 92 newly diagnosed MM patients, using the Affymetrix GeneChip and the Agilent aCGH platforms, respectively. We identified 1,114 CN-sensitive genes by Pearson's correlation coefficient (PCC) of gene expression levels and the copy numbers of the corresponding DNA loci, keeping the false discovery rate to <5%. On the basis of these CN-sensitive genes, we developed a vCA model for predicting CA in MM patients by means of GEP. The model focuses particularly on chromosomes 3, 5, 7, 9, 11, 13, 15, 19, and 21, as well as the 1p, 1q, and 6q segments, which are the most commonly altered chromosome regions in MM plasma cells. The reference CA (rCA) of a given chromosome region were determined by the mean values of signals of aCGH probes located in that region. The values of rCA could be used to distinguish among amplification, deletion, and normal. The predicted CA (pCA) of a given chromosome region were determined by the following procedures. First, we calculated the mean expression levels of CN-sensitive genes within the region. Then, by training the model in a GEP data set with 92 MM samples, we set the cutoff value of the mean expression levels of CN-sensitive genes for each chromosome region in order to obtain pCA that were most consistent with rCA in terms of the Matthews correlation coefficient, a measure of the quality of binary (two-class) classifications. The mean prediction accuracy was 0.88 (0.59–0.99) when the model was applied to the training data set. To check for overfitting in the vCA model, we applied the model to an independent data set of 23 MM samples for which both GEP and aCGH data were available. The mean prediction accuracy was 0.89 (0.74–1.00), which indicated that overfitting was negligible if present at all. We further validated the model with a FISH data set compiled from 262 independent MM samples for which both FISH records and GEP data were available. The mean prediction accuracy was 0.87. The consistency between vCA-predicted chromosomal alterations and findings of karyotyping dropped to 0.65. However, this underperformance could be due to the fact that karyotyping is limited by the low proliferation rate of terminally differentiated plasma cells in vitro. Conclusion: Our results provide a proof of concept that GEP data alone can reveal all the information provided by conventional cytogenetic techniques. We show that re-purposing gene expression data using our model is a fast and economical way to obtain cytogenetic information that is accurate and can be used for diagnosis and observation in MM and potentially other malignancies. GEP can serve as a one-stop genomic data source for information from the level of specific genes to whole chromosomes. Disclosures: Barlogie: Celgene: Consultancy, Honoraria, Research Funding; IMF: Consultancy, Honoraria; MMRF: Consultancy; Millennium: Consultancy, Honoraria, Research Funding; Genzyme: Consultancy; Novartis: Research Funding; NCI: Research Funding; Johnson & Johnson: Research Funding; Centocor: Research Funding; Onyx: Research Funding; Icon: Research Funding. Shaughnessy:Myeloma Health, Celgene, Genzyme, Novartis: Consultancy, Employment, Equity Ownership, Honoraria, Patents & Royalties.


2010 ◽  
Vol 299 (5) ◽  
pp. C930-C938 ◽  
Author(s):  
Zihua Hu ◽  
Sukru Gulec ◽  
James F. Collins

Molecular mechanisms mediating the induction of metal ion homeostasis-related genes in the mammalian intestine during iron deficiency remain unknown. To elucidate relevant regulatory pathways, genomewide gene expression profiles were determined in fully differentiated human intestinal epithelial (Caco-2) cells. Cells were deprived of iron (or not) for 6 or 18 h, and Gene Chip analyses were subsequently performed (Affymetrix). More than 2,000 genes were differentially expressed; genes related to monosaccharide metabolism, regulation of gene expression, hypoxia, and cell death were upregulated, while those related to mitotic cell cycle were downregulated. A large proportion of induced genes are hypoxia responsive, and promoter enrichment analyses revealed a statistical overrepresentation of hypoxia response elements (HREs). Immunoblot experiments demonstrated a >60-fold increase in HIF2α protein abundance in iron-deprived cells; HIF1α levels were unchanged. Furthermore, comparison of the Caco-2 cell data set with a Gene Chip data set from iron-deficient rat intestine revealed 29 common upregulated genes; the majority are hypoxia responsive, and their promoters are enriched for HREs. We conclude that the compensatory response of the intestinal epithelium to iron deprivation relates to hypoxia and that stabilization of HIF2α may be the primary event mediating metabolic and morphological changes observed during iron deficiency.


2020 ◽  
Author(s):  
Weimiao Wu ◽  
Qile Dai ◽  
Yunqing Liu ◽  
Xiting Yan ◽  
Zuoheng Wang

AbstractSingle-cell RNA sequencing provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses. We propose a novel method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and other existing methods to seven single-cell datasets to compare their performance. Our results demonstrated that G2S3 is superior in recovering true expression levels, identifying cell subtypes, improving differential expression analyses, and recovering gene regulatory relationships, especially for mildly expressed genes.


Sign in / Sign up

Export Citation Format

Share Document