scholarly journals Data Matrix Normalization and Merging Strategies Minimize Batch-specific Systemic Variation in scRNA-Seq Data

2021 ◽  
Author(s):  
Benjamin R Babcock ◽  
Astrid Kosters ◽  
Junkai Yang ◽  
Mackenzie L White ◽  
Eliver Ghosn

Single-cell RNA sequencing (scRNA-seq) can reveal accurate and sensitive RNA abundance in a single sample, but robust integration of multiple samples remains challenging. Large-scale scRNA-seq data generated by different workflows or laboratories can contain batch-specific systemic variation. Such variation challenges data integration by confounding sample-specific biology with undesirable batch-specific systemic effects. Therefore, there is a need for guidance in selecting computational and experimental approaches to minimize batch-specific impacts on data interpretation and a need to empirically evaluate the sources of systemic variation in a given dataset. To uncover the contributions of experimental variables to systemic variation, we intentionally perturb four potential sources of batch-effect in five human peripheral blood samples. We investigate sequencing replicate, sequencing depth, sample replicate, and the effects of pooling libraries for concurrent sequencing. To quantify the downstream effects of these variables on data interpretation, we introduced a new scoring metric, the Cell Misclassification Statistic (CMS), which identifies losses to cell type fidelity that occur when merging datasets of different batches. CMS reveals an undesirable overcorrection by popular batch-effect correction and data integration methods. We show that optimizing gene expression matrix normalization and merging can reduce the need for batch-effect correction and minimize the risk of overcorrecting true biological differences between samples.

2021 ◽  
Author(s):  
Michael F. Adamer ◽  
Sarah C. Brueningk ◽  
Alejandro Tejada-Arranz ◽  
Fabienne Estermann ◽  
Marek Basler ◽  
...  

With the steadily increasing abundance of omics data produced all over the world, sometimes decades apart and under vastly different experimental conditions residing in public databases, a crucial step in many data-driven bioinformatics applications is that of data integration. The challenge of batch effect removal for entire databases lies in the large number and coincide of both batches and desired, biological variation resulting in design matrix singularity. This problem currently cannot be solved by any common batch correction algorithm. In this study, we present reComBat, a regularised version of the empirical Bayes method to overcome this limitation. We demonstrate our approach for the harmonisation of public gene expression data of the human opportunistic pathogen Pseudomonas aeruginosa and study a several metrics to empirically demonstrate that batch effects are successfully mitigated while biologically meaningful gene expression variation is retained. reComBat fills the gap in batch correction approaches applicable to large scale, public omics databases and opens up new avenues for data driven analysis of complex biological processes beyond the scope of a single study.


Author(s):  
Balaje T. Thumati ◽  
Halasya Siva Subramania ◽  
Rajeev Shastri ◽  
Karthik Kalyana Kumar ◽  
Nicole Hessner ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Nishith Kumar ◽  
Md. Aminul Hoque ◽  
Masahiro Sugimoto

AbstractMass spectrometry is a modern and sophisticated high-throughput analytical technique that enables large-scale metabolomic analyses. It yields a high-dimensional large-scale matrix (samples × metabolites) of quantified data that often contain missing cells in the data matrix as well as outliers that originate for several reasons, including technical and biological sources. Although several missing data imputation techniques are described in the literature, all conventional existing techniques only solve the missing value problems. They do not relieve the problems of outliers. Therefore, outliers in the dataset decrease the accuracy of the imputation. We developed a new kernel weight function-based proposed missing data imputation technique that resolves the problems of missing values and outliers. We evaluated the performance of the proposed method and other conventional and recently developed missing imputation techniques using both artificially generated data and experimentally measured data analysis in both the absence and presence of different rates of outliers. Performances based on both artificial data and real metabolomics data indicate the superiority of our proposed kernel weight-based missing data imputation technique to the existing alternatives. For user convenience, an R package of the proposed kernel weight-based missing value imputation technique was developed, which is available at https://github.com/NishithPaul/tWLSA.


Author(s):  
Henning Weinrich ◽  
Yasin Emre Durmus ◽  
Hermann Tempel ◽  
Hans Kungl ◽  
Rüdiger-A. Eichel

Abstract: Metal-air batteries provide a most promising battery technology given their outstanding potential energy densities, which are desirable for both stationary and mobile applications in a ‘beyond lithium-ion’ battery market. Silicon- and iron-air batteries underwent less research and development compared to lithium- and zinc-air batteries. Nevertheless, in the recent past, the two also-ran battery systems made considerable progress and attracted rising research interest due to the excellent resource-efficiency of silicon and iron. Silicon and iron are among the top five of the most abundant elements in the earth’s crust, which ensures almost infinite material supply of the anode materials, even for large scale applications. Furthermore, primary silicon-air batteries are set to provide one of the highest energy densities among all batteries, while iron-air batteries are frequently considered as a highly rechargeable system with decent performance characteristics. Considering fundamental aspects for the anode materials, i.e., the metal electrodes, in this review, we will first outline the challenges, which explicitly apply to silicon- and iron-air batteries and prevented them from a broad implementation so far. Afterwards, we provide an extensive literature survey regarding state-of-the-art experimental approaches, which are set to resolve the aforementioned challenges and might enable the introduction of silicon- and iron-air batteries into the battery market in the future.


2016 ◽  
Vol 119 (suppl_1) ◽  
Author(s):  
Jun Zou ◽  
Diana Tran ◽  
Angelo Pelonero ◽  
Rahul C Deo

Background: We recently discovered a conserved internal promoter in the Titin gene, which explains why truncating mutations in the C-terminal two thirds of the zebrafish ttna protein result in more severe disease, recapitulating a puzzling observation in human dilated cardiomyopathy (DCM) patients. Here we focus on the contribution of alternative splicing to the DCM phenotype, both in zebrafish Titin truncation mutants and in the context of an integrative model for Titin mutation interpretation. Methods and Results: Using CRISPR/Cas9, we disrupted an alternatively spliced exon in the I-band of Titin , normally present in zebrafish heart but absent in skeletal muscle. The resulting mutants had, on average, a milder cardiac phenotype than those with mutations in constitutive exons but also showed striking inter-sibling variability in disease expression, ranging from intact cardiac blood flow to severe early demise. The mutant exon demonstrated nonsense-altered splicing and disease severity paralleled selective deficiency in Titin transcript level, implying that variability in mutated exon inclusion coupled with nonsense-mediated decay (NMD) modulated phenotype. We next amassed Titin mutation information from 1785 human DCM cases and >68,000 controls to model mutation distribution and found three variance components 1) splicing; 2) internal isoform disruption; and 3) targeting of the C-terminal 2000 amino acids. An integrated model demonstrated strong predictive performance with an area under the receiver operating characteristic curve of 0.79 and correctly identified the highest risk individuals. Conclusions: We conclude that genetically targeted models and large-scale human data can be complementary in overcoming the challenges of genetic data interpretation.


Blood ◽  
1994 ◽  
Vol 84 (9) ◽  
pp. 2898-2903 ◽  
Author(s):  
R Henschler ◽  
W Brugger ◽  
T Luft ◽  
T Frey ◽  
R Mertelsmann ◽  
...  

Abstract CD34(+)-selected hematopoietic progenitor cells are being increasingly used for autotransplantation, and recent evidence indicates that these cells can be expanded ex vivo. Of 15 patients with solid tumors undergoing a phase I/II clinical trial using CD34(+)-selected peripheral blood progenitor cells (PBPCs) after high-dose chemotherapy, we analyzed the frequency of long-term culture-initiating cells (LTCIC) as a measure of transplantation potential before and after ex vivo expansion of CD34+ cells. PBPCs were mobilized by combination chemotherapy and granulocyte colony-stimulating factor (G-CSF). The original unseparated leukapheresis preparations, the CD34(+)-enriched transplants, as well as nonabsorbed fractions eluting from the CD34 immunoaffinity columns (Ceprate; CellPro, Bothell, WA) were monitored for their capacity to repopulate irradiated allogeneic stroma in human long-term bone marrow cultures. We found preservation of more than three quarters of fully functional LTCIC in the CD34(+)-selected fractions. Quantitation of LTCIC by limiting dilution analysis showed a 53-fold enrichment of LTCIC from 1/9,075 in the unseparated cells to an incidence of 1/169 in the CD34+ fractions. Thus, in a single apheresis, it was possible to harvest a median of 1.65 x 10(4) LTCIC per kg body weight (range, 0.71 to 3.72). In addition, in six patients, large-scale ex vivo expansions were performed using a five-factor cytokine combination consisting of stem cell factor (SCF), interleukin-1 (IL-1), IL-3, IL-6, and erythropoietin (EPO), previously shown to expand committed progenitor cells. LTCIC were preserved, but not expanded during the culture period. Optimization of ex vivo expansion growth factor requirements using limiting dilution assays for LTCIC estimation indicated that the five-factor combination using SCF, IL-1, IL-3, IL-6, and EPO together with autologous plasma was the most reliable combination securing both high progenitor yield and, at the same time, optimal preservation of LTCIC. Our data suggest that ex vivo-expanded CD34+ PBPCs might be able to allow long-term reconstitution of hematopoiesis.


Genes ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 238 ◽  
Author(s):  
Evangelina López de Maturana ◽  
Lola Alonso ◽  
Pablo Alarcón ◽  
Isabel Adoración Martín-Antoniano ◽  
Silvia Pineda ◽  
...  

Omics data integration is already a reality. However, few omics-based algorithms show enough predictive ability to be implemented into clinics or public health domains. Clinical/epidemiological data tend to explain most of the variation of health-related traits, and its joint modeling with omics data is crucial to increase the algorithm’s predictive ability. Only a small number of published studies performed a “real” integration of omics and non-omics (OnO) data, mainly to predict cancer outcomes. Challenges in OnO data integration regard the nature and heterogeneity of non-omics data, the possibility of integrating large-scale non-omics data with high-throughput omics data, the relationship between OnO data (i.e., ascertainment bias), the presence of interactions, the fairness of the models, and the presence of subphenotypes. These challenges demand the development and application of new analysis strategies to integrate OnO data. In this contribution we discuss different attempts of OnO data integration in clinical and epidemiological studies. Most of the reviewed papers considered only one type of omics data set, mainly RNA expression data. All selected papers incorporated non-omics data in a low-dimensionality fashion. The integrative strategies used in the identified papers adopted three modeling methods: Independent, conditional, and joint modeling. This review presents, discusses, and proposes integrative analytical strategies towards OnO data integration.


Metabolites ◽  
2020 ◽  
Vol 10 (10) ◽  
pp. 381
Author(s):  
Lisa Eisenbeiss ◽  
Tina M. Binz ◽  
Markus R. Baumgartner ◽  
Thomas Kraemer ◽  
Andrea E. Steuer

Untargeted metabolomic studies are used for large-scale analysis of endogenous compounds. Due to exceptional long detection windows of incorporated substances in hair, analysis of hair samples for retrospective monitoring of metabolome changes has recently been introduced. However, information on the general behavior of metabolites in hair samples is scarce, hampering correct data interpretation so far. The presented study aimed to investigate endogenous metabolites depending on hair color and along the hair strand and to propose recommendations for best practice in hair metabolomic studies. A metabolite selection was analyzed using untargeted data acquisition in genuine hair samples from different hair colors and after segmentation in 3 cm segments. Significant differences in metabolites among hair colors and segments were found. In conclusion, consideration of hair color and hair segments is necessary for hair metabolomic studies and, subsequently, recommendations for best practice in hair metabolomic studies were proposed.


Blood ◽  
1992 ◽  
Vol 80 (6) ◽  
pp. 1418-1422 ◽  
Author(s):  
M Bregni ◽  
M Magni ◽  
S Siena ◽  
M Di Nicola ◽  
G Bonadonna ◽  
...  

Abstract Hematopoietic progenitor cells circulate in the peripheral blood (PB) of cancer patients during the recovery phase that follows treatment with high-dose cyclophosphamide followed by hematopoietic growth factor infusion. We report that when PB progenitors were exposed in vitro to filtered supernatant from cell line PA317-N2, producing amphotropic helper-free N2 vector at conventional titers, successful retroviral- mediated transfer of neomycin resistance gene was documented by polymerase chain reaction in 93% of day 14 myelomonocytic colonies. Under the same conditions, gene transfer was achieved in 22% of steady- state bone marrow-derived myelomonocytic colonies. Neo-resistance gene transfer was documented also in a CD34+/cyclophosphamide-resistant precursor to granulocyte-macrophage colonies, an undifferentiated progenitor close to the hematopoietic stem cell. Neither cocultivation with vector-producing cells nor high vector titer were stringent requisites for efficient gene transfer. The large-scale availability of PB hematopoietic progenitors in cancer patients, together with the high gene transfer rate achieved under safe and clinically feasible conditions, support an optimal approach for gene transfer procedures into the human hematopoietic system.


Sign in / Sign up

Export Citation Format

Share Document