An Effective Data Normalization Strategy for Academic Datasets using Log Values

Author(s):  
V. Sathya Durga ◽  
Thangakumar Jeyaprakash
2015 ◽  
Vol 61 (11) ◽  
pp. 1333-1342 ◽  
Author(s):  
Heidi Schwarzenbach ◽  
Andreia Machado da Silva ◽  
George Calin ◽  
Klaus Pantel

Abstract BACKGROUND Different technologies, such as quantitative real-time PCR or microarrays, have been developed to measure microRNA (miRNA) expression levels. Quantification of miRNA transcripts implicates data normalization using endogenous and exogenous reference genes for data correction. However, there is no consensus about an optimal normalization strategy. The choice of a reference gene remains problematic and can have a serious impact on the actual available transcript levels and, consequently, on the biological interpretation of data. CONTENT In this review article we discuss the reliability of the use of small RNAs, commonly reported in the literature as miRNA expression normalizers, and compare different strategies used for data normalization. SUMMARY A workflow strategy is proposed for normalization of miRNA expression data in an attempt to provide a basis for the establishment of a global standard procedure that will allow comparison across studies.


2017 ◽  
Vol 9 (35) ◽  
pp. 5219-5225
Author(s):  
Joseph L. Cantone ◽  
Zeyu Lin ◽  
Ira B. Dicker ◽  
Dieter M. Drexler

The LC-MS bioanalysis of protein kinetics assays is simplified by a data normalization strategy via internal proteolytic analyte utilized as a control standard.


2020 ◽  
Author(s):  
Ramachandro Majji

BACKGROUND Cancer is one of the deadly diseases prevailing worldwide and the patients with cancer are rescued only when the cancer is detected at the very early stage. Early detection of cancer is essential as, in the final stage, the chance of survival is limited. The symptoms of cancers are rigorous and therefore, all the symptoms should be studied properly before the diagnosis. OBJECTIVE Propose an automatic prediction system for classifying cancer to malignant or benign. METHODS This paper introduces the novel strategy based on the JayaAnt lion optimization-based Deep recurrent neural network (JayaALO-based DeepRNN) for cancer classification. The steps followed in the developed model are data normalization, data transformation, feature dimension detection, and classification. The first step is the data normalization. The goal of data normalization is to eliminate data redundancy and to mitigate the storage of objects in a relational database that maintains the same information in several places. After that, the data transformation is carried out based on log transformation that generates the patterns using more interpretable and helps fulfill the supposition, and to reduce skew. Also, the non-negative matrix factorization is employed for reducing the feature dimension. Finally, the proposed JayaALO-based DeepRNN method effectively classifies cancer-based on the reduced dimension features to produce a satisfactory result. RESULTS The proposed JayaALO-based DeepRNN showed improved results with maximal accuracy of 95.97%, the maximal sensitivity of 95.95%, and the maximal specificity of 96.96%. CONCLUSIONS The resulted output of the proposed JayaALO-based DeepRNN is used for cancer classification.


Author(s):  
Qian Zhu ◽  
Dac-Trung Nguyen ◽  
Eric Sid ◽  
Anne Pariser

Abstract Objective In this study, we aimed to evaluate the capability of the Unified Medical Language System (UMLS) as one data standard to support data normalization and harmonization of datasets that have been developed for rare diseases. Through analysis of data mappings between multiple rare disease resources and the UMLS, we propose suggested extensions of the UMLS that will enable its adoption as a global standard in rare disease. Methods We analyzed data mappings between the UMLS and existing datasets on over 7,000 rare diseases that were retrieved from four publicly accessible resources: Genetic And Rare Diseases Information Center (GARD), Orphanet, Online Mendelian Inheritance in Men (OMIM), and the Monarch Disease Ontology (MONDO). Two types of disease mappings were assessed, (1) curated mappings extracted from those four resources; and (2) established mappings generated by querying the rare disease-based integrative knowledge graph developed in the previous study. Results We found that 100% of OMIM concepts, and over 50% of concepts from GARD, MONDO, and Orphanet were normalized by the UMLS and accurately categorized into the appropriate UMLS semantic groups. We analyzed 58,636 UMLS mappings, which resulted in 3,876 UMLS concepts across these resources. Manual evaluation of a random set of 500 UMLS mappings demonstrated a high level of accuracy (99%) of developing those mappings, which consisted of 414 mappings of synonyms (82.8%), 76 are subtypes (15.2%), and five are siblings (1%). Conclusion The mapping results illustrated in this study that the UMLS was able to accurately represent rare disease concepts, and their associated information, such as genes and phenotypes, and can effectively be used to support data harmonization across existing resources developed on collecting rare disease data. We recommend the adoption of the UMLS as a data standard for rare disease to enable the existing rare disease datasets to support future applications in a clinical and community settings.


PLoS ONE ◽  
2019 ◽  
Vol 14 (1) ◽  
pp. e0210567 ◽  
Author(s):  
René A. J. Crans ◽  
Jana Janssens ◽  
Sofie Daelemans ◽  
Elise Wouters ◽  
Robrecht Raedt ◽  
...  

2017 ◽  
Vol 32 (2) ◽  
pp. 277-288 ◽  
Author(s):  
P. Pořízka ◽  
J. Klus ◽  
A. Hrdlička ◽  
J. Vrábel ◽  
P. Škarková ◽  
...  

Normalization of data is significant and should be chosen according to the sample matrix under investigation.


2012 ◽  
Vol 13 (1) ◽  
pp. 10 ◽  
Author(s):  
Shannon M Bell ◽  
Lyle D Burgoon ◽  
Robert L Last

2021 ◽  
Author(s):  
Zhiqiang Zhan ◽  
Jianyu Zhao ◽  
Yang Zhang ◽  
Jiangtao Gong ◽  
Qianying Wang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document