normalization methods
Recently Published Documents


TOTAL DOCUMENTS

366
(FIVE YEARS 120)

H-INDEX

36
(FIVE YEARS 6)

2021 ◽  
Author(s):  
Huaxu Yu ◽  
Tao Huan

Sample normalization is a critical step in metabolomics to remove differences in total sample amount or concentration of metabolites between biological samples. Here, we present MAFFIN, an accurate and robust post-acquisition sample normalization workflow that works universally for metabolomics data collected by mass spectrometry (MS)-based platforms. The most important design of MAFFIN is the calculation of normalization factor using maximal density fold change (MDFC) value computed by a kernel density-based approach. MDFC is more accurate than traditional median FC-based normalization, especially when the numbers of up- and down-regulated metabolic features are different. In addition, we showcase two essential steps that are overlooked by conventional normalization methods, and incorporated them into MAFFIN. First, instead of using all detected metabolic features, MAFFIN automatically extracts and uses only the high-quality features to calculate FCs and determine the normalization factor. In particular, multiple orthogonal criteria are proposed to pick up the high-quality features. Second, to guarantee the accuracy of the FCs, the MS signal intensities of the high-quality features are corrected using serial quality control (QC) samples. Using simulated data and urine metabolomics datasets, we demonstrated the critical need of high-quality feature selection, MS signal correction, and MDFC. We also show the superior performance of MAFFIN over other commonly used post-acquisition sample normalization methods. Finally, a biological application on a human saliva metabolomics study shows that MAFFIN provides robust sample normalization, leading to better data separation in principal component analysis (PCA) and the identification of more significantly altered metabolic features.


2021 ◽  
pp. 1-18
Author(s):  
Satish Kumar ◽  
Sunanda Gupta ◽  
Sakshi Arora

Network Intrusion detection systems (NIDS) detect malicious and intrusive information in computer networks. Presently, commercial NIDS is based on machine learning approaches that have complex algorithms and increase intrusion detection efficiency and efficacy. These machine learning-based NIDS use high dimensional network traffic data from which intrusive information is to be detected. This high-dimensional network traffic data in NIDS needs to be preprocessed and normalized to make it suitable for machine learning tools. A machine learning approach with appropriate normalization and prepossessing increases NIDS performance. This paper presents an empirical study on various normalization methods implemented on a benchmark network traffic dataset, KDD Cup’99, that has been used to evaluate the NIDS model. The present study shows decimal normalization has a better prediction performance than non-normalized traffic data categorized into ‘normal’ or ‘intrusive’ classes.


2021 ◽  
Author(s):  
Jeroen Langeveld ◽  
Remy Schilperoort ◽  
Leo Heijnen ◽  
goffe elsinga ◽  
claudia schapendonk ◽  
...  

Over the course of the COVID-19 pandemic in 2020-2021, monitoring of SARS-CoV-2 RNA in wastewater has rapidly evolved into a supplementary surveillance instrument for public health. Short term trends (2 weeks) are used as a basis for policy and decision making on measures for dealing with the pandemic. Normalization is required to account for the varying dilution rates of the domestic wastewater, that contains the shedded virus RNA. The dilution rate varies due to runoff, industrial discharges and extraneous waters. Three normalization methods using flow, conductivity and CrAssphage, have been investigated on 9 monitoring locations between Sep 2020 and Aug 2021, rendering 1071 24-hour flow-proportional samples. In addition, 221 stool samples have been analyzed to determine the daily CrAssphage load per person. Results show that flow normalization supported by a quality check using conductivity monitoring is the advocated normalization method in case flow monitoring is or can be made available. Although Crassphage shedding rates per person vary greatly, the CrAssphage loads were very consistent over time and space and direct CrAssphage based normalization can be applied reliably for populations of 5600 and above.


Energies ◽  
2021 ◽  
Vol 14 (22) ◽  
pp. 7470
Author(s):  
Gabriel Zsembinszki ◽  
Boniface Dominick Mselle ◽  
David Vérez ◽  
Emiliano Borri ◽  
Andreas Strehlow ◽  
...  

A clear gap was identified in the literature regarding the in-depth evaluation of scaling up thermal energy storage components. To cover such a gap, a new methodological approach was developed and applied to a novel latent thermal energy storage module. The purpose of this paper is to identify some key aspects to be considered when scaling up the module from lab-scale to full-scale using different performance indicators calculated in both charge and discharge. Different normalization methods were applied to allow an appropriate comparison of the results at both scales. As a result of the scaling up, the theoretical energy storage capacity increases by 52% and 145%, the average charging power increases by 21% and 94%, while the average discharging power decreases by 16% but increases by 36% when mass and volume normalization methods are used, respectively. When normalization by the surface area of heat transfer is used, all of the above performance indicators decrease, especially the average discharging power, which decreases by 49%. Moreover, energy performance in charge and discharge decreases by 17% and 15%, respectively. However, efficiencies related to charging, discharging, and round-trip processes are practically not affected by the scaling up.


2021 ◽  
Author(s):  
Jennifer L. Mamrosh ◽  
Jing Li ◽  
David J. Sherman ◽  
Annie Moradian ◽  
Michael J. Sweredoski ◽  
...  

Protein degradation products are constitutively presented as peptide antigens by MHC Class I. While hypervariability of Class I genes is known to tremendously impact antigen presentation, whether differential function of protein degradation pathways (comprising >1000 genes) could alter antigen generation remains poorly understood apart from a few model substrates. In this study, we introduce normalization methods for quantitative antigen mass spectrometry and confirm that most Class I antigens are dependent on ubiquitination and proteasomal degradation. Remarkably, many antigens derived from mitochondrial inner membrane proteins are not. Additionally, we find that atypical antigens can arise from compensatory protein degradation pathways, such as an increase in mitochondrial and membrane protein antigen presentation upon proteasome inhibition. Notably, incomplete inhibition of protein degradation pathways may have clinical utility in cancer immunotherapy, as evidenced by appearance of novel antigens upon partial proteasome inhibition.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12233
Author(s):  
Diem-Trang Tran ◽  
Matthew Might

Normalization of RNA-seq data has been an active area of research since the problem was first recognized a decade ago. Despite the active development of new normalizers, their performance measures have been given little attention. To evaluate normalizers, researchers have been relying on ad hoc measures, most of which are either qualitative, potentially biased, or easily confounded by parametric choices of downstream analysis. We propose a metric called condition-number based deviation, or cdev, to quantify normalization success. cdev measures how much an expression matrix differs from another. If a ground truth normalization is given, cdev can then be used to evaluate the performance of normalizers. To establish experimental ground truth, we compiled an extensive set of public RNA-seq assays with external spike-ins. This data collection, together with cdev, provides a valuable toolset for benchmarking new and existing normalization methods.


Author(s):  
John Graf ◽  
Sanghee Cho ◽  
Elizabeth McDonough ◽  
Alex Corwin ◽  
Anup Sood ◽  
...  

Abstract Motivation Multiplexed immunofluorescence bioimaging of single-cells and their spatial organization in tissue holds great promise to the development of future precision diagnostics and therapeutics. Current multiplexing pipelines typically involve multiple rounds of immunofluorescence staining across multiple tissue slides. This introduces experimental batch effects that can hide underlying biological signal. It is important to have robust algorithms that can correct for the batch effects while not introducing biases into the data. Performance of data normalization methods can vary among different assay pipelines. To evaluate differences, it is critical to have a ground truth dataset that is representative of the assay. Results A new immunoFLuorescence Image NOrmalization (FLINO) method is presented and evaluated against alternative methods and workflows. Multi-round immunofluorescence staining of the same tissue with the nuclear dye DAPI was used to represent virtual slides and a ground truth. DAPI was re-stained on a given tissue slide producing multiple images of the same underlying structure but undergoing multiple representative tissue handling steps. This ground truth dataset was used to evaluate and compare multiple normalization methods including median, quantile, smooth quantile, median ratio normalization (MRN) and trimmed mean of the M-values (TMM). These methods were applied in both an unbiased grid object and segmented cell object workflow to 24 multiplexed biomarkers. An upper quartile normalization of grid objects in log space was found to obtain almost equivalent performance to directly normalizing segmented cell objects by the middle quantile. The developed grid-based technique was then applied with on-slide controls for evaluation. Using five or fewer controls per slide can introduce biases into the data. Ten or more on-slide controls were able to robustly correct for batch effects. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Alvin Subakti ◽  
Hendri Murfi ◽  
Nora Hariadi

Abstract Text clustering is the task of grouping a set of texts so that text in the same group will be more similar than those from a different group. The process of grouping text manually requires a significant amount of time and labor. Therefore, automation utilizing machine learning is necessary. The standard method used to represent textual data is Term Frequency Inverse Document Frequency (TFIDF). However, TFIDF cannot consider the position and context of a word in a sentence. Bidirectional Encoder Representation from Transformers (BERT) model can produce text representation that incorporates the position and context of a word in a sentence. This research analyzed the performance of the BERT model as data representation for text. Moreover, various feature extraction and normalization methods are also applied for the data representation of the BERT model. To examine the performances of BERT, we use four clustering algorithms, i.e., k-means clustering, eigenspace-based fuzzy c-means, deep embedded clustering, and improved deep embedded clustering. Our simulations show that BERT outperforms the standard TFIDF method in 28 out of 36 metrics. Furthermore, different feature extraction and normalization produced varied performances. The usage of these feature extraction and normalization must be altered depending on the text clustering algorithm used.


Sign in / Sign up

Export Citation Format

Share Document