Classification Model on Big Data in Medical Diagnosis Based on Semi-Supervised Learning

2020 ◽  
Author(s):  
Lei Wang ◽  
Qing Qian ◽  
Qiang Zhang ◽  
Jishuai Wang ◽  
Wenbo Cheng ◽  
...  

Abstract Big data in medical diagnosis can provide abundant value for clinical diagnosis, decision support and many other applications, but obtaining a large number of labeled medical data will take a lot of time and manpower. In this paper, a classification model based on semi-supervised learning algorithm using both labeled and unlabeled data is proposed to process big data in medical diagnosis, which includes structured, semi-structured and unstructured data. For the medical laboratory data, this paper proposes a self-training algorithm based on repeated labeling strategy to solve the problem that mislabeled samples weaken the performance of classifiers. Aiming at medical record data, this paper extracts features with high correlation of classification results based on domain expert knowledge base first, and then chooses the unlabeled medical record data with the highest confidence to expand the training set and optimizes the performance of the classifiers of tri-training algorithm, which uses supervised learning algorithm to train three basic classifiers. The experimental results show that the proposed medical diagnosis data classification model based on semi-supervised learning algorithm has good performance.

2018 ◽  
Vol 25 (4) ◽  
pp. 1290-1298 ◽  
Author(s):  
Andrew P Reimer ◽  
Elizabeth A Madigan

Veracity, one of the five V’s used to describe big data, has received attention when it comes to using electronic medical record data for research purposes. In this perspective article, we discuss the idea of data veracity and associated concepts as it relates to the use of electronic medical record data and administrative data in research. We discuss the idea that electronic medical record data are “good enough” for clinical practice and, as such, are “good enough” for certain applications. We then propose three primary issues to attend to when establishing data veracity: data provenance, cross validation, and context.


SOEPRA ◽  
2020 ◽  
Vol 6 (2) ◽  
pp. 4
Author(s):  
Liya Suwarni

Background. Cases of sexual violence increase every year, victims ranging from adolescents, children to toddlers. Based on data from the Indonesian Child Protection Commission, abuse and violence against children in Indonesia in 2013 were 23 cases, in 2014 there were 53 cases, in 2015 there were 133 cases, 2017 reached 1,337 cases, and as of July 2018 there were 424 cases. Purpose. Knowing the factors that influence the law enforcement process of sexy violence cases in Semarang City. Method This study uses descriptive analytical methods for cases of violence against children, based on medical record data in hospitals, documents in Mapolrestabes, the District Attorney's Office and the Semarang City Court for the period of January 2015 to December 2018. Results. Based on research results obtained 213 experimental cases section from medical record data in hospitals in the city of Semarang. Most cases of child abuse occurred in 2018 with 72 cases. Most victims are 12-14 years old age group, female. Most types of cases are cases of intercourse. The majority of violations are persons known as victims, perpetrators not working, and most of the places of occurrence are in the defendant's house. At the time of prosecution and trial, the number of cases was significantly reduced to only 8 cases. Factors related to this include lack of evidence, difficulty in obtaining information from victims, convoluted statements of coverage, lack of election, and obtaining diversion rates. Conclusion Cases of sexual violence have increased from year to year. The process of law enforcement on this problem still has many difficulties in each manufacturing process which is still difficult to overcome.


Algorithms ◽  
2018 ◽  
Vol 11 (9) ◽  
pp. 139 ◽  
Author(s):  
Ioannis Livieris ◽  
Andreas Kanavos ◽  
Vassilis Tampakas ◽  
Panagiotis Pintelas

Semi-supervised learning algorithms have become a topic of significant research as an alternative to traditional classification methods which exhibit remarkable performance over labeled data but lack the ability to be applied on large amounts of unlabeled data. In this work, we propose a new semi-supervised learning algorithm that dynamically selects the most promising learner for a classification problem from a pool of classifiers based on a self-training philosophy. Our experimental results illustrate that the proposed algorithm outperforms its component semi-supervised learning algorithms in terms of accuracy, leading to more efficient, stable and robust predictive models.


Sign in / Sign up

Export Citation Format

Share Document