Logistic discrimination based on G-mean and F-measure for imbalanced problem

2016 ◽  
Vol 31 (3) ◽  
pp. 1155-1166 ◽  
Author(s):  
Huaping Guo ◽  
Hongbing Liu ◽  
Changan Wu ◽  
Weimei Zhi ◽  
Yan Xiao ◽  
...  
1980 ◽  
Vol 19 (04) ◽  
pp. 220-226 ◽  
Author(s):  
P. A. Lachenbruch ◽  
W. R. Clarke

This review article discusses current use of discriminant analysis in epidemiology. Contents include historical review, simple extensions and generalizations, examples, evaluation of rules, logistic discrimination, and robustness.


2018 ◽  
Vol 9 (2) ◽  
pp. 97-105
Author(s):  
Richard Firdaus Oeyliawan ◽  
Dennis Gunawan

Library is one of the facilities which provides information, knowledge resource, and acts as an academic helper for readers to get the information. The huge number of books which library has, usually make readers find the books with difficulty. Universitas Multimedia Nusantara uses the Senayan Library Management System (SLiMS) as the library catalogue. SLiMS has many features which help readers, but there is still no recommendation feature to help the readers finding the books which are relevant to the specific book that readers choose. The application has been developed using Vector Space Model to represent the document in vector model. The recommendation in this application is based on the similarity of the books description. Based on the testing phase using one-language sample of the relevant books, the F-Measure value gained is 55% using 0.1 as cosine similarity threshold. The books description and variety of languages affect the F-Measure value gained. Index Terms—Book Recommendation, Porter Stemmer, SLiMS Universitas Multimedia Nusantara, TF-IDF, Vector Space Model


2018 ◽  
Vol 9 (1) ◽  
pp. 9-17
Author(s):  
Marcel Bonar Kristanda ◽  
Seng Hansun ◽  
Albert Albert

Library catalog is a documentation or list of all library collections. Unfortunately, there is a problem identified in the process of searching a book inside library catalog in Universitas Multimedia Nusantara’s library information system regarding the relevant result based on user query input. This research aims to design and build a library catalog application on Android platform in order to increase the relvancy of searching result in a database using calculated Rocchio Relevance Feedback method along with user experience measurement. User experience analysis result presented a good respond with 91.18% score based by all factor and relevance value present 71.43% precision, 100% recall, and 83.33% F-Measure. Differences of relevant results between the Senayan Library Information system (SLiMS) and the new Android application ranged at 36.11%. Therefore, this Android application proved to give relevant result based on relevance rank. Index Terms—Rocchio, Relevance, Feedback, Pencarian, Buku, Aplikasi, Android, Perpustakaan.


2019 ◽  
Author(s):  
Chin Lin ◽  
Yu-Sheng Lou ◽  
Chia-Cheng Lee ◽  
Chia-Jung Hsu ◽  
Ding-Chung Wu ◽  
...  

BACKGROUND An artificial intelligence-based algorithm has shown a powerful ability for coding the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) in discharge notes. However, its performance still requires improvement compared with human experts. The major disadvantage of the previous algorithm is its lack of understanding medical terminologies. OBJECTIVE We propose some methods based on human-learning process and conduct a series of experiments to validate their improvements. METHODS We compared two data sources for training the word-embedding model: English Wikipedia and PubMed journal abstracts. Moreover, the fixed, changeable, and double-channel embedding tables were used to test their performance. Some additional tricks were also applied to improve accuracy. We used these methods to identify the three-chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. Subsequently, 94,483-labeled discharge notes from June 1, 2015 to June 30, 2017 were used from the Tri-Service General Hospital in Taipei, Taiwan. To evaluate performance, 24,762 discharge notes from July 1, 2017 to December 31, 2017, from the same hospital were used. Moreover, 74,324 additional discharge notes collected from other seven hospitals were also tested. The F-measure is the major global measure of effectiveness. RESULTS In understanding medical terminologies, the PubMed-embedding model (Pearson correlation = 0.60/0.57) shows a better performance compared with the Wikipedia-embedding model (Pearson correlation = 0.35/0.31). In the accuracy of ICD-10-CM coding, the changeable model both used the PubMed- and Wikipedia-embedding model has the highest testing mean F-measure (0.7311 and 0.6639 in Tri-Service General Hospital and other seven hospitals, respectively). Moreover, a proposed method called a hybrid sampling method, an augmentation trick to avoid algorithms identifying negative terms, was found to additionally improve the model performance. CONCLUSIONS The proposed model architecture and training method is named as ICD10Net, which is the first expert level model practically applied to daily work. This model can also be applied in unstructured information extraction from free-text medical writing. We have developed a web app to demonstrate our work (https://linchin.ndmctsgh.edu.tw/app/ICD10/).


2020 ◽  
Author(s):  
Shintaro Tsuji ◽  
Andrew Wen ◽  
Naoki Takahashi ◽  
Hongjian Zhang ◽  
Katsuhiko Ogasawara ◽  
...  

BACKGROUND Named entity recognition (NER) plays an important role in extracting the features of descriptions for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities depends on its dictionary lookup. Especially, the recognition of compound terms is very complicated because there are a variety of patterns. OBJECTIVE The objective of the study is to develop and evaluate a NER tool concerned with compound terms using the RadLex for mining free-text radiology reports. METHODS We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general-purpose dictionary, GPD). We manually annotated 400 of radiology reports for compound terms (Cts) in noun phrases and used them as the gold standard for the performance evaluation (precision, recall, and F-measure). Additionally, we also created a compound-term-enhanced dictionary (CtED) by analyzing false negatives (FNs) and false positives (FPs), and applied it for another 100 radiology reports for validation. We also evaluated the stem terms of compound terms, through defining two measures: an occurrence ratio (OR) and a matching ratio (MR). RESULTS The F-measure of the cTAKES+RadLex+GPD was 32.2% (Precision 92.1%, Recall 19.6%) and that of combined the CtED was 67.1% (Precision 98.1%, Recall 51.0%). The OR indicated that stem terms of “effusion”, "node", "tube", and "disease" were used frequently, but it still lacks capturing Cts. The MR showed that 71.9% of stem terms matched with that of ontologies and RadLex improved about 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using ontologies. CONCLUSIONS We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance toward expanding vocabularies.


Healthcare ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. 234 ◽  
Author(s):  
Hyun Yoo ◽  
Soyoung Han ◽  
Kyungyong Chung

Recently, a massive amount of big data of bioinformation is collected by sensor-based IoT devices. The collected data are also classified into different types of health big data in various techniques. A personalized analysis technique is a basis for judging the risk factors of personal cardiovascular disorders in real-time. The objective of this paper is to provide the model for the personalized heart condition classification in combination with the fast and effective preprocessing technique and deep neural network in order to process the real-time accumulated biosensor input data. The model can be useful to learn input data and develop an approximation function, and it can help users recognize risk situations. For the analysis of the pulse frequency, a fast Fourier transform is applied in preprocessing work. With the use of the frequency-by-frequency ratio data of the extracted power spectrum, data reduction is performed. To analyze the meanings of preprocessed data, a neural network algorithm is applied. In particular, a deep neural network is used to analyze and evaluate linear data. A deep neural network can make multiple layers and can establish an operation model of nodes with the use of gradient descent. The completed model was trained by classifying the ECG signals collected in advance into normal, control, and noise groups. Thereafter, the ECG signal input in real time through the trained deep neural network system was classified into normal, control, and noise. To evaluate the performance of the proposed model, this study utilized a ratio of data operation cost reduction and F-measure. As a result, with the use of fast Fourier transform and cumulative frequency percentage, the size of ECG reduced to 1:32. According to the analysis on the F-measure of the deep neural network, the model had 83.83% accuracy. Given the results, the modified deep neural network technique can reduce the size of big data in terms of computing work, and it is an effective system to reduce operation time.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
D Huang ◽  
Z Zhang ◽  
K Lin ◽  
Z Zuo ◽  
Q Chen ◽  
...  

Abstract Background Atrial fibrillation (AF) is a major public health problem with significant adverse outcomes and catheter ablation is a widely adopted treatment. The CABANA trial showed that catheter ablation reduced AF recurrence to a greater extent than medications. However, some of patients who underwent this procedure still experience relapse. Here, we present an innovative way to identify this subgroup using an artificial intelligence (AI) -assisted coronary sinus electrogram. Hypothesis Our hypothesis is that credible features in the electrogram can be extracted by AI for prediction, therefore rigorous drug administration, close follow-up or potential second procedure can be applied to these patients. Methods 67 patients from two independent hospitals (SPH & ZSH) with non-valvular persistent AF undergoing circumferential pulmonary vein isolation were enrolled in this study, 23 of which experienced recurrence 6 months after the procedure. We collected standard 2.5-second fragments of coronary sinus electrogram from ENSITE NAVX (SPH) and Carto (ZSH)system before the ablation started. A total of 1429 fragments were obtained and a transfer learning-based ResNet model was employed in our study. Fragments from ZSH were used for training and SPH for validation of deep convolutional neural networks (DCNN). The AI model performance was evaluated by accuracy, recall, precision, F-Measure and AUC. Results The prediction accuracy of the DCNN in single center reached 96%, while that in different ablation systems reached 74.3%. Also, the algorithm yielded values for the AUC, recall, precision and F-Measure of 0.76, 86.1%, 95.9% and 0.78, respectively, which shows satisfactory classification results and extensibility in different cardiology centers and brands of electroanatomic mapping instruments. Conclusions Our work has revealed the potential intrinsic correlation between coronary sinus electrical activity and AF recurrence using DCNN-based model. Moreover, the DCNN model we developed shows great prospects in the relapse prediction for personalized post-procedural management. Funding Acknowledgement Type of funding source: Foundation. Main funding source(s): The National Natural Science Foundation of China


2021 ◽  
pp. 1-13
Author(s):  
Richa ◽  
Punam Bedi

Recommender System (RS) is an information filtering approach that helps the overburdened user with information in his decision making process and suggests items which might be interesting to him. While presenting recommendation to the user, accuracy of the presented list is always a concern for the researchers. However, in recent years, the focus has now shifted to include the unexpectedness and novel items in the list along with accuracy of the recommended items. To increase the user acceptance, it is important to provide potentially interesting items which are not so obvious and different from the items that the end user has rated. In this work, we have proposed a model that generates serendipitous item recommendation and also takes care of accuracy as well as the sparsity issues. Literature suggests that there are various components that help to achieve the objective of serendipitous recommendations. In this paper, fuzzy inference based approach is used for the serendipity computation because the definitions of the components overlap. Moreover, to improve the accuracy and sparsity issues in the recommendation process, cross domain and trust based approaches are incorporated. A prototype of the system is developed for the tourism domain and the performance is measured using mean absolute error (MAE), root mean square error (RMSE), unexpectedness, precision, recall and F-measure.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Hakan Gunduz

AbstractIn this study, the hourly directions of eight banking stocks in Borsa Istanbul were predicted using linear-based, deep-learning (LSTM) and ensemble learning (LightGBM) models. These models were trained with four different feature sets and their performances were evaluated in terms of accuracy and F-measure metrics. While the first experiments directly used the own stock features as the model inputs, the second experiments utilized reduced stock features through Variational AutoEncoders (VAE). In the last experiments, in order to grasp the effects of the other banking stocks on individual stock performance, the features belonging to other stocks were also given as inputs to our models. While combining other stock features was done for both own (named as allstock_own) and VAE-reduced (named as allstock_VAE) stock features, the expanded dimensions of the feature sets were reduced by Recursive Feature Elimination. As the highest success rate increased up to 0.685 with allstock_own and LSTM with attention model, the combination of allstock_VAE and LSTM with the attention model obtained an accuracy rate of 0.675. Although the classification results achieved with both feature types was close, allstock_VAE achieved these results using nearly 16.67% less features compared to allstock_own. When all experimental results were examined, it was found out that the models trained with allstock_own and allstock_VAE achieved higher accuracy rates than those using individual stock features. It was also concluded that the results obtained with the VAE-reduced stock features were similar to those obtained by own stock features.


Sign in / Sign up

Export Citation Format

Share Document