XData: A General-purpose Unified Processing System for Data Analysis and Machine Learning

Abstract Objective Normalizing clinical mentions to concepts in standardized medical terminologies, in general, is challenging due to the complexity and variety of the terms in narrative medical records. In this article, we introduce our work on a clinical natural language processing (NLP) system to automatically normalize clinical mentions to concept unique identifier in the Unified Medical Language System. This work was part of the 2019 n2c2 (National NLP Clinical Challenges) Shared-Task and Workshop on Clinical Concept Normalization. Materials and Methods We developed a hybrid clinical NLP system that combines a generic multilevel matching framework, customizable matching components, and machine learning ranking systems. We explored 2 machine leaning ranking systems based on either ensemble of various similarity features extracted from pretrained encoders or a Siamese attention network, targeting at efficient and fast semantic searching/ranking. Besides, we also evaluated the performance of a general-purpose clinical NLP system based on Unstructured Information Management Architecture. Results The systems were evaluated as part of the 2019 n2c2 challenge, and our original best system in the challenge obtained an accuracy of 0.8101, ranked fifth in the challenge. The improved system with newly designed machine learning ranking based on Siamese attention network improved the accuracy to 0.8209. Conclusions We demonstrate the successful practice of combining multilevel matching and machine learning ranking for clinical concept normalization. Our results indicate the capability and interpretability of our proposed approach, as well as the limitation, suggesting the opportunities of achieving better performance by combining general clinical NLP systems.

Download Full-text

Bipolar Disorder and Oxidative Stress Injury Mechanism - Clinical Big Data Analysis Based on Machine Learning

Case Medical Research ◽

10.31525/ct1-nct03949218 ◽

2019 ◽

Author(s):

Keyword(s):

Oxidative Stress ◽

Machine Learning ◽

Bipolar Disorder ◽

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Injury Mechanism ◽

Stress Injury ◽

Oxidative Stress Injury ◽

And Oxidative Stress

Download Full-text

Machine Learning Based Predictive Action on Categorical Non-Sequential Data

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190417150421 ◽

2020 ◽

Vol 13 (5) ◽

pp. 1020-1030

Author(s):

Pradeep S. ◽

Jagadish S. Kallimani

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Categorical Data ◽

Numerical Data ◽

Processing Technique ◽

Machine Learning Algorithms ◽

Sequential Data ◽

Industry Standard ◽

Robust Model ◽

Future Work

Background: With the advent of data analysis and machine learning, there is a growing impetus of analyzing and generating models on historic data. The data comes in numerous forms and shapes with an abundance of challenges. The most sorted form of data for analysis is the numerical data. With the plethora of algorithms and tools it is quite manageable to deal with such data. Another form of data is of categorical nature, which is subdivided into, ordinal (order wise) and nominal (number wise). This data can be broadly classified as Sequential and Non-Sequential. Sequential data analysis is easier to preprocess using algorithms. Objective: The challenge of applying machine learning algorithms on categorical data of nonsequential nature is dealt in this paper. Methods: Upon implementing several data analysis algorithms on such data, we end up getting a biased result, which makes it impossible to generate a reliable predictive model. In this paper, we will address this problem by walking through a handful of techniques which during our research helped us in dealing with a large categorical data of non-sequential nature. In subsequent sections, we will discuss the possible implementable solutions and shortfalls of these techniques. Results: The methods are applied to sample datasets available in public domain and the results with respect to accuracy of classification are satisfactory. Conclusion: The best pre-processing technique we observed in our research is one hot encoding, which facilitates breaking down the categorical features into binary and feeding it into an Algorithm to predict the outcome. The example that we took is not abstract but it is a real – time production services dataset, which had many complex variations of categorical features. Our Future work includes creating a robust model on such data and deploying it into industry standard applications.

Download Full-text

Measuring Engagement Level in Child-Robot Interaction Using Machine Learning Based Data Analysis

2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI) ◽

10.1109/icdabi51230.2020.9325676 ◽

2020 ◽

Author(s):

George K. Sidiropoulos ◽

George A. Papakostas ◽

Chris Lytridis ◽

Christos Bazinas ◽

Vassilis G. Kaburlasos ◽

...

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Robot Interaction

Download Full-text

Machine learning-assisted production data analysis in liquid-rich Duvernay Formation

Journal of Petroleum Science and Engineering ◽

10.1016/j.petrol.2021.108377 ◽

2021 ◽

Vol 200 ◽

pp. 108377

Author(s):

Bing Kong ◽

Zhuoheng Chen ◽

Shengnan Chen ◽

Tianjie Qin

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Production Data

Download Full-text

Machine learning based real-time vehicle data analysis for safe driving modeling

Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing - SAC '19 ◽

10.1145/3297280.3297584 ◽

2019 ◽

Cited By ~ 2

Author(s):

Pamul Yadav ◽

Sangsu Jung ◽

Dhananjay Singh

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Real Time ◽

Safe Driving ◽

Vehicle Data

Download Full-text

Microcomputers in Political Science

News for Teachers of Political Science ◽

10.1017/s0197901900005079 ◽

1983 ◽

Vol 38 ◽

pp. 1-9

Author(s):

Herbert F. Weisberg

Keyword(s):

Data Analysis ◽

Political Science ◽

Large Scale ◽

Turnaround Time ◽

General Purpose ◽

Batch Mode ◽

New Era ◽

Large Scale Data ◽

The Social ◽

Frequency Counts

We are now entering a new era of computing in political science. The first era was marked by punched-card technology. Initially, the most sophisticated analyses possible were frequency counts and tables produced on a counter-sorter, a machine that specialized in chewing up data cards. By the early 1960s, batch processing on large mainframe computers became the predominant mode of data analysis, with turnaround time of up to a week. By the late 1960s, turnaround time was cut down to a matter of a few minutes and OSIRIS and then SPSS (and more recently SAS) were developed as general-purpose data analysis packages for the social sciences. Even today, use of these packages in batch mode remains one of the most efficient means of processing large-scale data analysis.

Download Full-text

Topologic Data Analysis and Machine Learning

JACC Cardiovascular Imaging ◽

10.1016/j.jcmg.2021.04.005 ◽

2021 ◽

Author(s):

Rebecca T. Hahn

Keyword(s):

Machine Learning ◽

Data Analysis

Download Full-text

Passenger data analysis of Titanic using machine learning approach in the context of chances of surviving the disaster

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1065/1/012042 ◽

2021 ◽

Vol 1065 (1) ◽

pp. 012042

Author(s):

Md Arfinul Haque ◽

G Shivaprasad ◽

G Guruprasad

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Classification of apatite structures via topological data analysis: a framework for a ‘Materials Barcode’ representation of structure maps

Scientific Reports ◽

10.1038/s41598-021-90070-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Scott Broderick ◽

Ruhil Dongol ◽

Tianmu Zhang ◽

Krishna Rajan

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Crystal Chemistry ◽

Persistent Homology ◽

Hierarchical Classification ◽

Topological Data Analysis ◽

Learning Tool ◽

Coordination Polyhedra ◽

Machine Learning Tool ◽

Topological Data

AbstractThis paper introduces the use of topological data analysis (TDA) as an unsupervised machine learning tool to uncover classification criteria in complex inorganic crystal chemistries. Using the apatite chemistry as a template, we track through the use of persistent homology the topological connectivity of input crystal chemistry descriptors on defining similarity between different stoichiometries of apatites. It is shown that TDA automatically identifies a hierarchical classification scheme within apatites based on the commonality of the number of discrete coordination polyhedra that constitute the structural building units common among the compounds. This information is presented in the form of a visualization scheme of a barcode of homology classifications, where the persistence of similarity between compounds is tracked. Unlike traditional perspectives of structure maps, this new “Materials Barcode” schema serves as an automated exploratory machine learning tool that can uncover structural associations from crystal chemistry databases, as well as to achieve a more nuanced insight into what defines similarity among homologous compounds.

Download Full-text