Classification Model on Big Data in Medical Diagnosis Based on Semi-Supervised Learning

Abstract Big data in medical diagnosis can provide abundant value for clinical diagnosis, decision support and many other applications, but obtaining a large number of labeled medical data will take a lot of time and manpower. In this paper, a classification model based on semi-supervised learning algorithm using both labeled and unlabeled data is proposed to process big data in medical diagnosis, which includes structured, semi-structured and unstructured data. For the medical laboratory data, this paper proposes a self-training algorithm based on repeated labeling strategy to solve the problem that mislabeled samples weaken the performance of classifiers. Aiming at medical record data, this paper extracts features with high correlation of classification results based on domain expert knowledge base first, and then chooses the unlabeled medical record data with the highest confidence to expand the training set and optimizes the performance of the classifiers of tri-training algorithm, which uses supervised learning algorithm to train three basic classifiers. The experimental results show that the proposed medical diagnosis data classification model based on semi-supervised learning algorithm has good performance.

Download Full-text

Veracity in big data: How good is good enough

Health Informatics Journal ◽

10.1177/1460458217744369 ◽

2018 ◽

Vol 25 (4) ◽

pp. 1290-1298 ◽

Cited By ~ 2

Author(s):

Andrew P Reimer ◽

Elizabeth A Madigan

Keyword(s):

Big Data ◽

Clinical Practice ◽

Medical Record ◽

Electronic Medical Record ◽

Administrative Data ◽

Cross Validation ◽

Data Provenance ◽

Medical Record Data ◽

Electronic Medical Record Data ◽

Record Data

Veracity, one of the five V’s used to describe big data, has received attention when it comes to using electronic medical record data for research purposes. In this perspective article, we discuss the idea of data veracity and associated concepts as it relates to the use of electronic medical record data and administrative data in research. We discuss the idea that electronic medical record data are “good enough” for clinical practice and, as such, are “good enough” for certain applications. We then propose three primary issues to attend to when establishing data veracity: data provenance, cross validation, and context.

Download Full-text

Design of Electronic Medical Record data integration model based on OGSA-DAI

2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE) ◽

10.1109/icacte.2010.5579621 ◽

2010 ◽

Author(s):

Zhang Yaowei ◽

Xu Yabin

Keyword(s):

Data Integration ◽

Medical Record ◽

Electronic Medical Record ◽

Medical Record Data ◽

Integration Model ◽

Model Based ◽

Electronic Medical Record Data ◽

Record Data

Download Full-text

Influential Factors in The Law Enforcement Process of Sexual Violence Cases in Children in The City of Semarang

SOEPRA ◽

10.24167/shk.v6i2.2912 ◽

2020 ◽

Vol 6 (2) ◽

pp. 4

Author(s):

Liya Suwarni

Keyword(s):

Law Enforcement ◽

Sexual Violence ◽

Medical Record ◽

Child Protection ◽

Influential Factors ◽

Medical Record Data ◽

Violence Against Children ◽

Record Data ◽

The Law ◽

The City

Background. Cases of sexual violence increase every year, victims ranging from adolescents, children to toddlers. Based on data from the Indonesian Child Protection Commission, abuse and violence against children in Indonesia in 2013 were 23 cases, in 2014 there were 53 cases, in 2015 there were 133 cases, 2017 reached 1,337 cases, and as of July 2018 there were 424 cases. Purpose. Knowing the factors that influence the law enforcement process of sexy violence cases in Semarang City. Method This study uses descriptive analytical methods for cases of violence against children, based on medical record data in hospitals, documents in Mapolrestabes, the District Attorney's Office and the Semarang City Court for the period of January 2015 to December 2018. Results. Based on research results obtained 213 experimental cases section from medical record data in hospitals in the city of Semarang. Most cases of child abuse occurred in 2018 with 72 cases. Most victims are 12-14 years old age group, female. Most types of cases are cases of intercourse. The majority of violations are persons known as victims, perpetrators not working, and most of the places of occurrence are in the defendant's house. At the time of prosecution and trial, the number of cases was significantly reduced to only 8 cases. Factors related to this include lack of evidence, difficulty in obtaining information from victims, convoluted statements of coverage, lack of election, and obtaining diversion rates. Conclusion Cases of sexual violence have increased from year to year. The process of law enforcement on this problem still has many difficulties in each manufacturing process which is still difficult to overcome.

Download Full-text

Binary Classification Model Based on Machine Learning Algorithm for the Short-Circuit Detection in Power System

Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence ◽

10.1145/3377713.3377753 ◽

2019 ◽

Author(s):

Qiwei Lu ◽

Jinpei Cheng ◽

Dianlin Guo ◽

Mengmeng Su ◽

Xuewei Wu ◽

...

Keyword(s):

Machine Learning ◽

Power System ◽

Learning Algorithm ◽

Binary Classification ◽

Short Circuit ◽

Classification Model ◽

Machine Learning Algorithm ◽

Model Based

Download Full-text

Electronic Medical Record Data Sharing Through Authentication and Integrity Management

2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST) ◽

10.1109/icrest51555.2021.9331010 ◽

2021 ◽

Author(s):

Md. Kaim Iftahaj Nirjhor ◽

Mohammad Abu Yousuf ◽

Md. Shariar Mhaboob

Keyword(s):

Medical Record ◽

Data Sharing ◽

Electronic Medical Record ◽

Medical Record Data ◽

Electronic Medical Record Data ◽

Record Data ◽

Integrity Management

Download Full-text

Sa252 CHARACTERIZATION OF TREATMENT PRACTICES FOR PATIENTS WITH NEWLY DIAGNOSED H. PYLORI INFECTION: A US POPULATION-BASED STUDY USING CLAIMS AND ELECTRONIC MEDICAL RECORD DATA

Gastroenterology ◽

10.1016/s0016-5085(21)01821-7 ◽

2021 ◽

Vol 160 (6) ◽

pp. S-466

Author(s):

Colin W. Howden ◽

Eckhard Leifke ◽

Rinu Jacob ◽

Victoria Divino ◽

Ronnie Fass

Keyword(s):

Medical Record ◽

Population Based ◽

Newly Diagnosed ◽

Medical Record Data ◽

Population Based Study ◽

Treatment Practices ◽

Electronic Medical Record Data ◽

Record Data ◽

H Pylori

Download Full-text

Patient Characteristics of Premix Insulin Users in China: An Analysis of Electronic Medical Record Data

Value in Health ◽

10.1016/j.jval.2017.08.3021 ◽

2017 ◽

Vol 20 (9) ◽

pp. A487-A488

Author(s):

S Han ◽

K Wang ◽

J Hou ◽

J Wang ◽

EQ Wu

Keyword(s):

Medical Record ◽

Electronic Medical Record ◽

Patient Characteristics ◽

Medical Record Data ◽

Premix Insulin ◽

Electronic Medical Record Data ◽

Record Data

Download Full-text

An Auto-Adjustable Semi-Supervised Self-Training Algorithm

Algorithms ◽

10.3390/a11090139 ◽

2018 ◽

Vol 11 (9) ◽

pp. 139 ◽

Cited By ~ 5

Author(s):

Ioannis Livieris ◽

Andreas Kanavos ◽

Vassilis Tampakas ◽

Panagiotis Pintelas

Keyword(s):

Supervised Learning ◽

Predictive Models ◽

Learning Algorithm ◽

Learning Algorithms ◽

Classification Problem ◽

Classification Methods ◽

Training Algorithm ◽

Traditional Classification ◽

Supervised Learning Algorithms ◽

Significant Research

Semi-supervised learning algorithms have become a topic of significant research as an alternative to traditional classification methods which exhibit remarkable performance over labeled data but lack the ability to be applied on large amounts of unlabeled data. In this work, we propose a new semi-supervised learning algorithm that dynamically selects the most promising learner for a classification problem from a pool of classifiers based on a self-training philosophy. Our experimental results illustrate that the proposed algorithm outperforms its component semi-supervised learning algorithms in terms of accuracy, leading to more efficient, stable and robust predictive models.

Download Full-text