Data Normalization

Author(s):  
Joseph S. P. Fong ◽  
Kenneth Wong Ting Yan
Keyword(s):  
2020 ◽  
Author(s):  
Ramachandro Majji

BACKGROUND Cancer is one of the deadly diseases prevailing worldwide and the patients with cancer are rescued only when the cancer is detected at the very early stage. Early detection of cancer is essential as, in the final stage, the chance of survival is limited. The symptoms of cancers are rigorous and therefore, all the symptoms should be studied properly before the diagnosis. OBJECTIVE Propose an automatic prediction system for classifying cancer to malignant or benign. METHODS This paper introduces the novel strategy based on the JayaAnt lion optimization-based Deep recurrent neural network (JayaALO-based DeepRNN) for cancer classification. The steps followed in the developed model are data normalization, data transformation, feature dimension detection, and classification. The first step is the data normalization. The goal of data normalization is to eliminate data redundancy and to mitigate the storage of objects in a relational database that maintains the same information in several places. After that, the data transformation is carried out based on log transformation that generates the patterns using more interpretable and helps fulfill the supposition, and to reduce skew. Also, the non-negative matrix factorization is employed for reducing the feature dimension. Finally, the proposed JayaALO-based DeepRNN method effectively classifies cancer-based on the reduced dimension features to produce a satisfactory result. RESULTS The proposed JayaALO-based DeepRNN showed improved results with maximal accuracy of 95.97%, the maximal sensitivity of 95.95%, and the maximal specificity of 96.96%. CONCLUSIONS The resulted output of the proposed JayaALO-based DeepRNN is used for cancer classification.


Author(s):  
Qian Zhu ◽  
Dac-Trung Nguyen ◽  
Eric Sid ◽  
Anne Pariser

Abstract Objective In this study, we aimed to evaluate the capability of the Unified Medical Language System (UMLS) as one data standard to support data normalization and harmonization of datasets that have been developed for rare diseases. Through analysis of data mappings between multiple rare disease resources and the UMLS, we propose suggested extensions of the UMLS that will enable its adoption as a global standard in rare disease. Methods We analyzed data mappings between the UMLS and existing datasets on over 7,000 rare diseases that were retrieved from four publicly accessible resources: Genetic And Rare Diseases Information Center (GARD), Orphanet, Online Mendelian Inheritance in Men (OMIM), and the Monarch Disease Ontology (MONDO). Two types of disease mappings were assessed, (1) curated mappings extracted from those four resources; and (2) established mappings generated by querying the rare disease-based integrative knowledge graph developed in the previous study. Results We found that 100% of OMIM concepts, and over 50% of concepts from GARD, MONDO, and Orphanet were normalized by the UMLS and accurately categorized into the appropriate UMLS semantic groups. We analyzed 58,636 UMLS mappings, which resulted in 3,876 UMLS concepts across these resources. Manual evaluation of a random set of 500 UMLS mappings demonstrated a high level of accuracy (99%) of developing those mappings, which consisted of 414 mappings of synonyms (82.8%), 76 are subtypes (15.2%), and five are siblings (1%). Conclusion The mapping results illustrated in this study that the UMLS was able to accurately represent rare disease concepts, and their associated information, such as genes and phenotypes, and can effectively be used to support data harmonization across existing resources developed on collecting rare disease data. We recommend the adoption of the UMLS as a data standard for rare disease to enable the existing rare disease datasets to support future applications in a clinical and community settings.


2017 ◽  
Vol 32 (2) ◽  
pp. 277-288 ◽  
Author(s):  
P. Pořízka ◽  
J. Klus ◽  
A. Hrdlička ◽  
J. Vrábel ◽  
P. Škarková ◽  
...  

Normalization of data is significant and should be chosen according to the sample matrix under investigation.


2012 ◽  
Vol 13 (1) ◽  
pp. 10 ◽  
Author(s):  
Shannon M Bell ◽  
Lyle D Burgoon ◽  
Robert L Last

2021 ◽  
Author(s):  
Zhiqiang Zhan ◽  
Jianyu Zhao ◽  
Yang Zhang ◽  
Jiangtao Gong ◽  
Qianying Wang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document