scholarly journals Coreference Resolution in Vietnamese Electronic Medical Records

Author(s):  
Hung D. Nguyen ◽  
Tru H. Cao

Electronic medical records (EMR) have emerged as an important source of data for research in medicine andinformation technology, as they contain much of valuable human medical knowledge in healthcare and patienttreatment. This paper tackles the problem of coreference resolution in Vietnamese EMRs. Unlike in English ones,in Vietnamese clinical texts, verbs are often used to describe disease symptoms. So we first define rules to annotateverbs as mentions and consider coreference between verbs and other noun or adjective mentions possible. Thenwe propose a support vector machine classifier on bag-of-words vector representation of mentions that takes intoaccount the special characteristics of Vietnamese language to resolve their coreference. The achieved F1 scoreon our dataset of real Vietnamese EMRs provided by a hospital in Ho Chi Minh city is 91.4%. To the best of ourknowledge, this is the first research work in coreference resolution on Vietnamese clinical texts.Keywords: Clinical text, support vector machine, bag-of-words vector, lexical similarity, unrestricted coreference

2020 ◽  
pp. 002029402096482
Author(s):  
Sulaiman Khan ◽  
Abdul Hafeez ◽  
Hazrat Ali ◽  
Shah Nazir ◽  
Anwar Hussain

This paper presents an efficient OCR system for the recognition of offline Pashto isolated characters. The lack of an appropriate dataset makes it challenging to match against a reference and perform recognition. This research work addresses this problem by developing a medium-size database that comprises 4488 samples of handwritten Pashto character; that can be further used for experimental purposes. In the proposed OCR system the recognition task is performed using convolution neural network. The performance analysis of the proposed OCR system is validated by comparing its results with artificial neural network and support vector machine based on zoning feature extraction technique. The results of the proposed experiments shows an accuracy of 56% for the support vector machine, 78% for artificial neural network, and 80.7% for the proposed OCR system. The high recognition rate shows that the OCR system based on convolution neural network performs best among the used techniques.


2021 ◽  
Vol 11 (12) ◽  
pp. 2976-2986
Author(s):  
M. Usha Rani ◽  
N. Saravana Selvam

Health informatics is one of the main branch of engineering which provides a solution to a variety of problems like delayed, missed or incorrect diagnoses with the help of computational techniques. With the help of technologies such as bio-computing, health informatics, the disaster impacts on both human health and biological factors can be reduced to a large extend. Using these computational technologies, the country’s economy can also get boosted up and due to increased disease-causing pathogens, which directly impact the human health system. In this research work, a different type of sugarcane disease is detected and classified because manual identification is difficult and time-consuming. So, the farmers couldn’t find a better solution, than on the whole, they go for stubble burning, which is an alarming issue both on human and environmental wellness. The burning of bagasse causes bagassois, an interstitial lung disease that affects the tissues present in the lung through the air sacs. So, this sugarcane disease detection needs to be done early to avoid various health and environmental issues. The proposed work consists of the detection of four types of sugarcane leaf disease directly from the field. The sequence of methods is capturing images with WSN nodes, pre-processing with image enhancement and noise removal (IENR), segmentation with Fuzzy membership function and clustering (FMFC), feature extraction using Gray Level Co-occurrence Matrix Vector (GLCMV) and classification using Support Vector Machine (SVM). With the help of the effective proposed method, the highest parameters like precision, accuracy, sensitivity, and specificity for sugarcane leaf disease have been obtained. Based on the successful implementation process, the accuracy stated for the four sugarcane diseases along with the execution time is given below as Smut disease (87.12, 1.01 sec), Rust disease (90.23, 1.02 sec), Grassy Shoot disease (95.34, 1.047 sec), Red Rot disease (95.51, 1.04 sec).


This research work is based on the diabetes prediction analysis. The prediction analysis technique has the three steps which are dataset input, feature extraction and classification. In this previous system, the Support Vector Machine and naïve bayes are applied for the diabetes prediction. In this research work, voting based method is applied for the diabetes prediction. The voting based method is the ensemble based which is applied for the diabetes prediction method. In the voting method, three classifiers are applied which are Support Vector Machine, naïve bayes and decision tree classifier. The existing and proposed methods are implemented in python and results in terms of accuracy, precision-recall and execution time. It is analyzed that voting based method give high performance as compared to other classifiers.


2021 ◽  
Author(s):  
Jiaming Zeng ◽  
Michael F. Gensheimer ◽  
Daniel L. Rubin ◽  
Susan Athey ◽  
Ross D. Shachter

AbstractIn medicine, randomized clinical trials (RCT) are the gold standard for informing treatment decisions. Observational comparative effectiveness research (CER) is often plagued by selection bias, and expert-selected covariates may not be sufficient to adjust for confounding. We explore how the unstructured clinical text in electronic medical records (EMR) can be used to reduce selection bias and improve medical practice. We develop a method based on natural language processing to uncover interpretable potential confounders from the clinical text. We validate our method by comparing the hazard ratio (HR) from survival analysis with and without the confounders against the results from established RCTs. We apply our method to four study cohorts built from localized prostate and lung cancer datasets from the Stanford Cancer Institute Research Database and show that our method adjusts the HR estimate towards the RCT results. We further confirm that the uncovered terms can be interpreted by an oncologist as potential confounders. This research helps enable more credible causal inference using data from EMRs, offers a transparent way to improve the design of observational CER, and could inform high-stake medical decisions. Our method can also be applied to studies within and beyond medicine to extract important information from observational data to support decisions.


2020 ◽  
Author(s):  
Castro Mayleen Dorcas Bondoc ◽  
Tumibay Gilbert Malawit

Today many schools, universities and institutions recognize the necessity and importance of using Learning Management Systems (LMS) as part of their educational services. This research work has applied LMS in the teaching and learning process of Bulacan State University (BulSU) Graduate School (GS) Program that enhances the face-to-face instruction with online components. The researchers uses an LMS that provides educators a platform that can motivate and engage students to new educational environment through manage online classes. The LMS allows educators to distribute information, manage learning materials, assignments, quizzes, and communications. Aside from the basic functions of the LMS, the researchers uses Machine Learning (ML) Algorithms applying Support Vector Machine (SVM) that will classify and identify the best related videos per topic. SVM is a supervised machine learning algorithm that analyzes data for classification and regression analysis by Maity [1]. The results of this study showed that integration of video tutorials in LMS can significantly contribute knowledge and skills in the learning process of the students.


Author(s):  
Ahmed Kharrat ◽  
Karim Gasmi ◽  
Mohamed Ben Messaoud ◽  
Nacéra Benamrane ◽  
Mohamed Abid

A new approach for automated diagnosis and classification of Magnetic Resonance (MR) human brain images is proposed. The proposed method uses Wavelets Transform (WT) as input module to Genetic Algorithm (GA) and Support Vector Machine (SVM). It segregates MR brain images into normal and abnormal. This contribution employs genetic algorithm for feature selection which requires much lighter computational burden in comparison with Sequential Floating Backward Selection (SFBS) and Sequential Floating Forward Selection (SFFS) methods. A percentage reduction rate of 88.63% is achieved. An excellent classification rate of 100% could be achieved using the support vector machine. The observed results are significantly better than the results reported in a previous research work employing Wavelet Transform and Support Vector Machine.


Author(s):  
L. Yang ◽  
L. Shi ◽  
P. Li ◽  
J. Yang ◽  
L. Zhao ◽  
...  

Due to the forward scattering and block of radar signal, the water, bare soil, shadow, named low backscattering objects (LBOs), often present low backscattering intensity in polarimetric synthetic aperture radar (PolSAR) image. Because the LBOs rise similar backscattering intensity and polarimetric responses, the spectral-based classifiers are inefficient to deal with LBO classification, such as Wishart method. Although some polarimetric features had been exploited to relieve the confusion phenomenon, the backscattering features are still found unstable when the system noise floor varies in the range direction. This paper will introduce a simple but effective scene classification method based on Bag of Words (BoW) model using Support Vector Machine (SVM) to discriminate the LBOs, without relying on any polarimetric features. In the proposed approach, square windows are firstly opened around the LBOs adaptively to determine the scene images, and then the Scale-Invariant Feature Transform (SIFT) points are detected in training and test scenes. The several SIFT features detected are clustered using K-means to obtain certain cluster centers as the visual word lists and scene images are represented using word frequency. At last, the SVM is selected for training and predicting new scenes as some kind of LBOs. The proposed method is executed over two AIRSAR data sets at C band and L band, including water, bare soil and shadow scenes. The experimental results illustrate the effectiveness of the scene method in distinguishing LBOs.


Market Basket Analysis is considered to be one among the highly popular and efficient sort of data analysis exploited in the marketing and retailing field. The objective of market basket analysis lies in deciding the products purchased together by the customers. Its name has originated from the concept of customers filling into a shopping cart everything of all they had purchased (a "market basket") while doing shopping in the grocery. Having a knowledge of the products that customers buy in group can be quiteusefulfor a retailer or to any other organization. A store could make the best use of this information to keep the products that are often sold together in the same place, whereas a catalog or World Wide Web (WWW) merchant could utilize it for deciding the structure of their catalog and order form. Since several applications such as market basket analysis, fraud detection in web, medical diagnosis, census data, Customer Relationship Management of business that makes use of association rules exists, the process involving Decision making can be improved. Security is also regarded to bean important facet for transactions done individually and frequent itemsets for database that are horizontally partitioned. In order to render security for lastly bough often used itemsets for transaction purposes, this research work introduces a novel key security algorithm that uses RSA cryptographic technique which is classifier based. The classifier makes use of information about several often utilized itemsets and it provides a key value to the actual company. For instance, in case if there are any reliance users, only the valid users can obtain that market info. The rest of the users belonging to the reliance organization are not allowed to select the data’s key value. First, the frequent itemsets are mined with the help of association rule mining employing Probabilistic Graphical Model techniques. Then the Enhanced Support Vector Machine (ESVM) classifier checks the key values of the mined frequent itemsets.


10.2196/29120 ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. e29120
Author(s):  
Bruna Stella Zanotto ◽  
Ana Paula Beck da Silva Etges ◽  
Avner dal Bosco ◽  
Eduardo Gabriel Cortes ◽  
Renata Ruschel ◽  
...  

Background With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective This study aims to compare the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods Our study addressed the computational problems of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: tier 1 (achieved health care status), tier 2 (recovery process), care related (clinical management and risk scores), and baseline characteristics. The analyzed data set was retrospectively extracted from the EMRs of patients with stroke from a private Brazilian hospital between 2018 and 2019. A total of 44,206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning methods, including state-of-the-art neural and nonneural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject-wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1 score), supported by statistical significance tests. A feature importance analysis was conducted to provide insights into the results. Results The top-performing models were support vector machines trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR textual representations. The support vector machine models produced statistically superior results in 71% (17/24) of tasks, with an F1 score >80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally or ambulate and communicate), health care status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional nonneural methods, given the characteristics of the data set. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to clinical conditions of stroke victims, and thus ultimately assess the possibility of proactively using these machine learning techniques in real-world situations.


Sign in / Sign up

Export Citation Format

Share Document