On classification of abstracts obtained from medical journals

2019 ◽  
Vol 46 (5) ◽  
pp. 648-663
Author(s):  
Bekir Parlak ◽  
Alper Kürşat Uysal

Classification of medical documents was mostly carried out on English data sets and these studies were performed on hospital records rather than academic texts. The main reasons behind this situation are the lack of publicly available data sets and the tasks being costly and time-consuming. As the first contribution of this study, two data sets including Turkish and English counterparts of the same abstracts published in Turkish medical journals were constructed. Turkish is one of the widely used agglutinative languages worldwide and English is a good example of non-agglutinative languages. While English abstracts were obtained automatically from MEDLINE database with a computer program, Turkish counterparts of these documents were collected manually from the Internet. As the second contribution of this study, an extensive comparison on classification of abstracts obtained from Turkish medical journals was made by using these two equivalent data sets. Features were extracted from text documents with three different approaches: unigram, bigram and hybrid. Hybrid approach includes a combination of unigram and bigram features. In the experiments, three different feature selection methods and seven different classifiers were utilised. According to the results on both data sets, classification performance of the English abstracts outperformed the Turkish counterparts. Maximum accuracies were obtained from the combination of unigram features, distinguishing feature selector (DFS) and multinomial naïve Bayes (MNB) classifier for both data sets. Unigram features were generally more efficient than bigram and hybrid features. However, analysis of top-10 features indicated that nearly half of the features were translations of each other for Turkish and English data sets.

2017 ◽  
Vol 1 (4) ◽  
pp. 271-277 ◽  
Author(s):  
Abdullah Caliskan ◽  
Mehmet Emin Yuksel

Abstract In this study, a deep neural network classifier is proposed for the classification of coronary artery disease medical data sets. The proposed classifier is tested on reference CAD data sets from the literature and also compared with popular representative classification methods regarding its classification performance. Experimental results show that the deep neural network classifier offers much better accuracy, sensitivity and specificity rates when compared with other methods. The proposed method presents itself as an easily accessible and cost-effective alternative to currently existing methods used for the diagnosis of CAD and it can be applied for easily checking whether a given subject under examination has at least one occluded coronary artery or not.


2018 ◽  
Author(s):  
Thomas P. Quinn ◽  
Samuel C. Lee ◽  
Svetha Venkatesh ◽  
Thin Nguyen

AbstractAlthough neuropsychiatric disorders have a well-established genetic background, their specific molecular foundations remain elusive. This has prompted many investigators to design studies that identify explanatory biomarkers, and then use these biomarkers to predict clinical outcomes. One approach involves using machine learning algorithms to classify patients based on blood mRNA expression from high-throughput transcriptomic assays. However, these endeavours typically fail to achieve the high level of performance, stability, and generalizability required for clinical translation. Moreover, these classifiers can lack interpretability because informative genes do not necessarily have relevance to researchers. For this study, we hypothesized that annotation-based classifiers can improve classification performance, stability, generalizability, and interpretability. To this end, we evaluated the performance of four classification algorithms on six neuropsychiatric data sets using four annotation databases. Our results suggest that the Gene Ontology Biological Process database can transform gene expression into an annotation-based feature space that improves the performance and stability of blood-based classifiers for neuropsychiatric conditions. We also show how annotation features can improve the interpretability of classifiers: since annotation databases are often used to assign biological importance to genes, annotation-based classifiers are easy to interpret because the biological importance of the features are the features themselves. We found that using annotations as features improves the performance and stability of classifiers. We also noted that the top ranked annotations tend contain the top ranked genes, suggesting that the most predictive annotations are a superset of the most predictive genes. Based on this, and the fact that annotations are used routinely to assign biological importance to genetic data, we recommend transforming gene-level expression into annotation-level expression prior to the classification of neuropsychiatric conditions.


Author(s):  
Ankit Srivastava ◽  
Vijendra Singh ◽  
Gurdeep Singh Drall

Over the past few years, the novel appeal and increasing popularity of social networks as a medium for users to express their opinions and views have created an accumulation of a massive amount of data. This evolving mountain of data is commonly termed Big Data. Accordingly, one area in which the application of new techniques in data mining research has significant potential to achieve more precise classification of hidden knowledge in Big Data is sentiment analysis (aka optimal mining). A hybrid approach using Naïve Bayes and Random Forest on mining Twitter datasets is presented here as an extension of previous work. Briefly, relevant data sets are collected from Twitter using Twitter API; then, use of the hybrid methodology is illustrated and evaluated against one with only Naïve Bayes classifier. Results show better accuracy and efficiency in the sentiment classification for the hybrid approach.


Entropy ◽  
2019 ◽  
Vol 21 (8) ◽  
pp. 745 ◽  
Author(s):  
Yangjie Wei ◽  
Shiliang Fang ◽  
Xiaoyan Wang

Since digital communication signals are widely used in radio and underwater acoustic systems, the modulation classification of these signals has become increasingly significant in various military and civilian applications. However, due to the adverse channel transmission characteristics and low signal to noise ratio (SNR), the modulation classification of communication signals is extremely challenging. In this paper, a novel method for automatic modulation classification of digital communication signals using a support vector machine (SVM) based on hybrid features, cyclostationary, and information entropy is proposed. In this proposed method, by combining the theory of the cyclostationary and entropy, based on the existing signal features, we propose three other new features to assist the classification of digital communication signals, which are the maximum value of the normalized cyclic spectrum when the cyclic frequency is not zero, the Shannon entropy of the cyclic spectrum, and Renyi entropy of the cyclic spectrum respectively. Because these new features do not require any prior information and have a strong anti-noise ability, they are very suitable for the identification of communication signals. Finally, a one against one SVM is designed as a classifier. Simulation results show that the proposed method outperforms the existing methods in terms of classification performance and noise tolerance.


2013 ◽  
Vol 448-453 ◽  
pp. 3645-3649 ◽  
Author(s):  
Shuo Ding ◽  
Xiao Heng Chang ◽  
Qing Hui Wu

Traditional pattern classification methods are not always efficient because sample data sets are sometimes incomplete and there are exceptions and counter examples. In this paper, SOFM neural network is applied in pattern classification of two-dimensional vectors after analysis of its structure and algorithm. The method to establish SOFM network via MATLAB7.0 is introduced before the network is applied to classify two-dimensional vectors. The adjustment process of weight vectors together with classification performance of SOFM model are also tested in the condition of different number of training steps. The simulation results show that the classification approach based on SOFM model is effective because of its fast speed, high accuracy and strong generalization ability.


Geophysics ◽  
2013 ◽  
Vol 78 (1) ◽  
pp. E41-E46 ◽  
Author(s):  
Laurens Beran ◽  
Barry Zelt ◽  
Leonard Pasion ◽  
Stephen Billings ◽  
Kevin Kingdon ◽  
...  

We have developed practical strategies for discriminating between buried unexploded ordnance (UXO) and metallic clutter. These methods are applicable to time-domain electromagnetic data acquired with multistatic, multicomponent sensors designed for UXO classification. Each detected target is characterized by dipole polarizabilities estimated via inversion of the observed sensor data. The polarizabilities are intrinsic target features and so are used to distinguish between UXO and clutter. We tested this processing with four data sets from recent field demonstrations, with each data set characterized by metrics of data and model quality. We then developed techniques for building a representative training data set and determined how the variable quality of estimated features affects overall classification performance. Finally, we devised a technique to optimize classification performance by adapting features during target prioritization.


Author(s):  
Sang-Il Choi ◽  
Sang Tae Choi ◽  
Haanju Yoo

We propose a method that generates input features to effectively classify low-dimensional data. To do this, we first generate high-order terms for the input features of the original low-dimensional data to form a candidate set of new input features. Then, the discrimination power of the candidate input features is quantitatively evaluated by calculating the ‘discrimination distance’ for each candidate feature. As a result, only candidates with a large amount of discriminative information are selected to create a new input feature vector, and the discriminant features that are to be used as input to the classifier are extracted from the new input feature vectors by using a subspace discriminant analysis. Experiments on low-dimensional data sets in the UCI machine learning repository and several kinds of low-resolution facial image data show that the proposed method improves the classification performance of low-dimensional data by generating features.


Author(s):  
Maryam Nuser ◽  
Enas Al-Horani

The number of digital medical documents is increasing continuously; several medical websites share a lot of unclassified articles. These articles have very long texts that should be read to determine the topic of each document. The classification of these documents is important so researchers can use these documents easily and the effort and time in reading and searching for a specific topic will be reduced. Therefore, an automatic way to extract latent topics from these text documents is needed. Topic modeling is one of the techniques used to deal with this problem. In this paper, a medical collection of documents is used; this collection contains documents from three types of widespread diseases (Heart Diseases, Blood Pressure and Cholesterol). LDA topic modeling technique is applied to classify these documents into the previous mentioned topics. An evaluation of the algorithm’s results is done and the LDA shows a good level of classification accuracy.


Author(s):  
Inzamam Mashood Nasir ◽  
Muhammad Rashid ◽  
Jamal Hussain Shah ◽  
Muhammad Sharif ◽  
Muhammad Yahiya Haider Awan ◽  
...  

Background: Breast cancer is considered as the most perilous sickness among females worldwide and the ratio of new cases is expanding yearly. Many researchers have proposed efficient algorithms to diagnose breast cancer at early stages, which have increased the efficiency and performance by utilizing the learned features of gold standard histopathological images. Objective: Most of these systems have either used traditional handcrafted features or deep features which had a lot of noise and redundancy, which ultimately decrease the performance of the system. Methods: A hybrid approach is proposed by fusing and optimizing the properties of handcrafted and deep features to classify the breast cancer images. HOG and LBP features are serially fused with pretrained models VGG19 and InceptionV3. PCR and ICR are used to evaluate the classification performance of proposed method. Results: The method concentrates on histopathological images to classify the breast cancer. The performance is compared with state-of-the-art techniques, where an overall patient-level accuracy of 97.2% and image-level accuracy of 96.7% is recorded. Conclusion: The proposed hybrid method achieves the best performance as compared to previous methods and it can be used for the intelligent healthcare systems and early breast cancer detection.


Author(s):  
Yuejun Liu ◽  
Yifei Xu ◽  
Xiangzheng Meng ◽  
Xuguang Wang ◽  
Tianxu Bai

Background: Medical imaging plays an important role in the diagnosis of thyroid diseases. In the field of machine learning, multiple dimensional deep learning algorithms are widely used in image classification and recognition, and have achieved great success. Objective: The method based on multiple dimensional deep learning is employed for the auxiliary diagnosis of thyroid diseases based on SPECT images. The performances of different deep learning models are evaluated and compared. Methods: Thyroid SPECT images are collected with three types, they are hyperthyroidism, normal and hypothyroidism. In the pre-processing, the region of interest of thyroid is segmented and the amount of data sample is expanded. Four CNN models, including CNN, Inception, VGG16 and RNN, are used to evaluate deep learning methods. Results: Deep learning based methods have good classification performance, the accuracy is 92.9%-96.2%, AUC is 97.8%-99.6%. VGG16 model has the best performance, the accuracy is 96.2% and AUC is 99.6%. Especially, the VGG16 model with a changing learning rate works best. Conclusion: The standard CNN, Inception, VGG16, and RNN four deep learning models are efficient for the classification of thyroid diseases with SPECT images. The accuracy of the assisted diagnostic method based on deep learning is higher than that of other methods reported in the literature.


Sign in / Sign up

Export Citation Format

Share Document