scholarly journals The Scope, Methods and Applications of Biomedical Data Mining

Author(s):  
Trudie Steyn ◽  
Nico Martins

Most literature assumptions have been drawn from public databases e.g. NHANES (National Health and Nutrition Examination Survey). Nonetheless, the sets of data are typically featured by high-dimensional timeliness, heterogeneity, characteristics and irregularity, hence amounting to valuation of these databases not being applied completely. Data Mining (DM) technologies have been the frontiers domains in biomedical studies, as it shows smart routine in assessing patients’ risks and aiding in the process of biomedical research and decision-making in developing disease-forecasting frameworks. In that case, DM has novel merits in biomedical Big Data (BD) studies, mostly in large-scale biomedical datasets. In this paper, a description of DM techniques alongside their fundamental practical applications will be provided. The objectives of this study are to help biomedical researchers to attain intuitive and clear appreciative of the applications of data-mining technologies on biomedical BD to enhance to creation of biomedical results, which are relevant in a biomedical setting.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Wen-Tao Wu ◽  
Yuan-Jie Li ◽  
Ao-Zi Feng ◽  
Li Li ◽  
Tao Huang ◽  
...  

AbstractMany high quality studies have emerged from public databases, such as Surveillance, Epidemiology, and End Results (SEER), National Health and Nutrition Examination Survey (NHANES), The Cancer Genome Atlas (TCGA), and Medical Information Mart for Intensive Care (MIMIC); however, these data are often characterized by a high degree of dimensional heterogeneity, timeliness, scarcity, irregularity, and other characteristics, resulting in the value of these data not being fully utilized. Data-mining technology has been a frontier field in medical research, as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models. Therefore, data mining has unique advantages in clinical big-data research, especially in large-scale medical public databases. This article introduced the main medical public database and described the steps, tasks, and models of data mining in simple language. Additionally, we described data-mining methods along with their practical applications. The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.


2021 ◽  
Vol 26 (1) ◽  
pp. 67-77
Author(s):  
Siva Sankari Subbiah ◽  
Jayakumar Chinnappan

Now a day, all the organizations collecting huge volume of data without knowing its usefulness. The fast development of Internet helps the organizations to capture data in many different formats through Internet of Things (IoT), social media and from other disparate sources. The dimension of the dataset increases day by day at an extraordinary rate resulting in large scale dataset with high dimensionality. The present paper reviews the opportunities and challenges of feature selection for processing the high dimensional data with reduced complexity and improved accuracy. In the modern big data world the feature selection has a significance in reducing the dimensionality and overfitting of the learning process. Many feature selection methods have been proposed by researchers for obtaining more relevant features especially from the big datasets that helps to provide accurate learning results without degradation in performance. This paper discusses the importance of feature selection, basic feature selection approaches, centralized and distributed big data processing using Hadoop and Spark, challenges of feature selection and provides the summary of the related research work done by various researchers. As a result, the big data analysis with the feature selection improves the accuracy of the learning.


Author(s):  
Cheng Meng ◽  
Ye Wang ◽  
Xinlian Zhang ◽  
Abhyuday Mandal ◽  
Wenxuan Zhong ◽  
...  

With advances in technologies in the past decade, the amount of data generated and recorded has grown enormously in virtually all fields of industry and science. This extraordinary amount of data provides unprecedented opportunities for data-driven decision-making and knowledge discovery. However, the task of analyzing such large-scale dataset poses significant challenges and calls for innovative statistical methods specifically designed for faster speed and higher efficiency. In this chapter, we review currently available methods for big data, with a focus on the subsampling methods using statistical leveraging and divide and conquer methods.


2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Andreas Friedrich ◽  
Erhan Kenar ◽  
Oliver Kohlbacher ◽  
Sven Nahnsen

Big data bioinformatics aims at drawing biological conclusions from huge and complex biological datasets. Added value from the analysis of big data, however, is only possible if the data is accompanied by accurate metadata annotation. Particularly in high-throughput experiments intelligent approaches are needed to keep track of the experimental design, including the conditions that are studied as well as information that might be interesting for failure analysis or further experiments in the future. In addition to the management of this information, means for an integrated design and interfaces for structured data annotation are urgently needed by researchers. Here, we propose a factor-based experimental design approach that enables scientists to easily create large-scale experiments with the help of a web-based system. We present a novel implementation of a web-based interface allowing the collection of arbitrary metadata. To exchange and edit information we provide a spreadsheet-based, humanly readable format. Subsequently, sample sheets with identifiers and metainformation for data generation facilities can be created. Data files created after measurement of the samples can be uploaded to a datastore, where they are automatically linked to the previously created experimental design model.


Author(s):  
Wonjung NOH ◽  
Heakyung MOON

Background: Sleep durations shorter or longer than 7 h are associated with cardiovascular diseases. We aimed to investigate the association among sleep duration, risk factors of hypertension, and cardiovascular disease in South Korea using data from a recent large-scale survey. Methods: Data produced by the Korea National Health and Nutrition Examination Survey (KNHANES) were subjected to multivariate logistic analysis. This cross-sectional, nationally representative survey was conducted from Jan 1 to Dec 31, 2011, by the Korean Center for Disease Control and Prevention. Overall, 6,466 participated. Data were analyzed using STATA version 13.0 (STATA Corp LP). Results: The participants’ socioeconomic, physical, and lifestyle factors were statistically different between the two age groups (<65 yr and ≥65 yr). Shorter sleep durations were associated with hypertension in individuals younger than 65 yr of age. On the other hand, in participants aged ≥65 yr, both shorter and longer sleep durations were associated with hypertension, while shorter sleep durations were associated with cardiovascular diseases. Conclusion: Unusual sleep durations are associated with an increased prevalence of cardiovascular disease among Korean adults. The effect of sleep duration appears to be more significant in individuals with hypertension, suggesting that the management of hypertension should be prioritized in patients older than 65 year.


2017 ◽  
Vol 35 (15_suppl) ◽  
pp. 2538-2538
Author(s):  
Mayur Sarangdhar ◽  
Bruce Aronow ◽  
Anil Goud Jegga ◽  
Brian Turpin ◽  
Erin Haag Breese ◽  
...  

2538 Background: Targeted anti-cancer small molecule drugs & immune therapies have had a dramatic impact in improving outcomes & the approach to clinical trials. Increasingly, regulatory approvals are expedited with small studies designed to identify strong efficacy signals. However, this may limit the extent of safety profiling. The use of large scale/big data meta-analyses can identify novel safety & efficacy signals in "real-world" medical settings. Methods: We used AERSMine, an open-source data mining platform to identify drug toxicity signatures in the FDA’s Adverse Event Reporting System of 8.6 million patients. We identified patients (n = 732,198) who received either traditional and targeted cancer therapy & identified therapy-specific toxicity patterns. Patients were classified based on exposures: anthracyclines (n = 83,179), platinum (117,993), antimetabolites (93,062), alkylators (81,507), antimicrotubule agents (97,726), HER2 inhibitors (40,040), VEGFis (79,144), VEGF-TKis (90,734), multi TKis (34,457), anaplastic lymphoma Kis (7,635), PI3K-AKT-mTOR inhibitors (33,864), Bruton TKis (9,247), MEKis (4,018), immunomodulatory agents (174,810), proteasome inhibitors (44,681), immune checkpoint inhibitors (20,287). Pharmacovigilance metrics [Relative Risks & safety signals] were used to establish statistical correlation & toxicity signatures were differentiated using the Kolmogorov–Smirnov test. Results: To validate the use of the AERSMine to detect AEs, we focused on cardiotoxicity. It identified classic drug associated AEs (e.g. ventricular dysfunction with anthracyclines, HER2is & VEGFis; VEGFi hypertension & vascular toxicity; multi TKIs vascular events). AERSMine also identified recently reported uncommon toxicities of myositis/myocarditis with immune checkpoint inhibitors. It indicated a higher frequency of myositis/myocarditis with combination immune checkpoint therapy, paralleling industry corporate safety databases. These toxicities were reported at higher frequencies in patients > 65 yrs. Conclusions: AERSMine “big data” analyses provide a sensitive tool to detect potential new patterns of AEs simultaneously across multiple clinical trials & in the real-world setting.


Author(s):  
Davide Barbieri ◽  
Nitesh Chawla ◽  
Luciana Zaccagni ◽  
Tonći Grgurinović ◽  
Jelena Šarac ◽  
...  

Cardiovascular diseases are the main cause of death worldwide. The aim of the present study is to verify the performances of a data mining methodology in the evaluation of cardiovascular risk in athletes, and whether the results may be used to support clinical decision making. Anthropometric (height and weight), demographic (age and sex) and biomedical (blood pressure and pulse rate) data of 26,002 athletes were collected in 2012 during routine sport medical examinations, which included electrocardiography at rest. Subjects were involved in competitive sport practice, for which medical clearance was needed. Outcomes were negative for the largest majority, as expected in an active population. Resampling was applied to balance positive/negative class ratio. A decision tree and logistic regression were used to classify individuals as either at risk or not. The receiver operating characteristic curve was used to assess classification performances. Data mining and resampling improved cardiovascular risk assessment in terms of increased area under the curve. The proposed methodology can be effectively applied to biomedical data in order to optimize clinical decision making, and—at the same time—minimize the amount of unnecessary examinations.


Sign in / Sign up

Export Citation Format

Share Document