scholarly journals Classification Problem in Imbalanced Datasets

Author(s):  
Aouatef Mahani ◽  
Ahmed Riad Baba Ali
Author(s):  
Nicksson Ckayo Arrais de Freitas ◽  
Ticiana L. Coelho Da Silva ◽  
José Antônio Fernandes De Macêdo ◽  
Leopoldo Melo Júnioer

Deep learning has gained much popularity in the past years due to GPU advancements, cloud computing improvements, and its supremacy, considering the accuracy results when trained on massive datasets. As with machine learning, deep learning models may experience low performance when handled with imbalanced datasets. In this paper, we focus on the trajectory classification problem, and we examine deep learning techniques for coping with imbalanced class data. We extend a deep learning model, called DeepeST (Deep Learning for Sub-Trajectory classification), to predict the class or label for sub-trajectories from imbalanced datasets. DeepeST is the first deep learning model for trajectory classification that provides approaches for coping with imbalanced dataset problems from the authors' knowledge. In this paper, we perform the experiments with three real datasets from LBSN (Location-Based Social Network) trajectories to identify who is the user of a sub-trajectory (similar to the Trajectory-User Linking problem). We show that DeepeST outperforms other deep learning approaches from state-of-the-art concerning the accuracy, precision, recall, and F1-score.


Author(s):  
Sunitha .T ◽  
Shyamala .J ◽  
Annie Jesus Suganthi Rani.A

Data mining suggest an innovative way of prognostication stereotype of Patients health risks. Large amount of Electronic Health Records (EHRs) collected over the years have provided a rich base for risk analysis and prediction. An EHR contains digitally stored healthcare information about an individual, such as observations, laboratory tests, diagnostic reports, medications, procedures, patient identifying information and allergies. A special type of EHR is the Health Examination Records (HER) from annual general health check-ups. Identifying participants at risk based on their current and past HERs is important for early warning and preventive intervention. By “risk”, we mean unwanted outcomes such as mortality and morbidity. This approach is limited due to the classification problem and consequently it is not informative about the specific disease area in which a personal is at risk. Limited amount of data extracted from the health record is not feasible for providing the accurate risk prediction. The main motive of this project is for risk prediction to classify progressively developing situation with the majority of the data unlabeled.


Vestnik MEI ◽  
2020 ◽  
Vol 5 (5) ◽  
pp. 132-139
Author(s):  
Ivan E. Kurilenko ◽  
◽  
Igor E. Nikonov ◽  

A method for solving the problem of classifying short-text messages in the form of sentences of customers uttered in talking via the telephone line of organizations is considered. To solve this problem, a classifier was developed, which is based on using a combination of two methods: a description of the subject area in the form of a hierarchy of entities and plausible reasoning based on the case-based reasoning approach, which is actively used in artificial intelligence systems. In solving various problems of artificial intelligence-based analysis of data, these methods have shown a high degree of efficiency, scalability, and independence from data structure. As part of using the case-based reasoning approach in the classifier, it is proposed to modify the TF-IDF (Term Frequency - Inverse Document Frequency) measure of assessing the text content taking into account known information about the distribution of documents by topics. The proposed modification makes it possible to improve the classification quality in comparison with classical measures, since it takes into account the information about the distribution of words not only in a separate document or topic, but in the entire database of cases. Experimental results are presented that confirm the effectiveness of the proposed metric and the developed classifier as applied to classification of customer sentences and providing them with the necessary information depending on the classification result. The developed text classification service prototype is used as part of the voice interaction module with the user in the objective of robotizing the telephone call routing system and making a shift from interaction between the user and system by means of buttons to their interaction through voice.


Processes ◽  
2020 ◽  
Vol 8 (9) ◽  
pp. 1071
Author(s):  
Lucia Billeci ◽  
Asia Badolato ◽  
Lorenzo Bachi ◽  
Alessandro Tonacci

Alzheimer’s disease is notoriously the most common cause of dementia in the elderly, affecting an increasing number of people. Although widespread, its causes and progression modalities are complex and still not fully understood. Through neuroimaging techniques, such as diffusion Magnetic Resonance (MR), more sophisticated and specific studies of the disease can be performed, offering a valuable tool for both its diagnosis and early detection. However, processing large quantities of medical images is not an easy task, and researchers have turned their attention towards machine learning, a set of computer algorithms that automatically adapt their output towards the intended goal. In this paper, a systematic review of recent machine learning applications on diffusion tensor imaging studies of Alzheimer’s disease is presented, highlighting the fundamental aspects of each work and reporting their performance score. A few examined studies also include mild cognitive impairment in the classification problem, while others combine diffusion data with other sources, like structural magnetic resonance imaging (MRI) (multimodal analysis). The findings of the retrieved works suggest a promising role for machine learning in evaluating effective classification features, like fractional anisotropy, and in possibly performing on different image modalities with higher accuracy.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Yahya Albalawi ◽  
Jim Buckley ◽  
Nikola S. Nikolov

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.


Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1550
Author(s):  
Alexandros Liapis ◽  
Evanthia Faliagka ◽  
Christos P. Antonopoulos ◽  
Georgios Keramidas ◽  
Nikolaos Voros

Physiological measurements have been widely used by researchers and practitioners in order to address the stress detection challenge. So far, various datasets for stress detection have been recorded and are available to the research community for testing and benchmarking. The majority of the stress-related available datasets have been recorded while users were exposed to intense stressors, such as songs, movie clips, major hardware/software failures, image datasets, and gaming scenarios. However, it remains an open research question if such datasets can be used for creating models that will effectively detect stress in different contexts. This paper investigates the performance of the publicly available physiological dataset named WESAD (wearable stress and affect detection) in the context of user experience (UX) evaluation. More specifically, electrodermal activity (EDA) and skin temperature (ST) signals from WESAD were used in order to train three traditional machine learning classifiers and a simple feed forward deep learning artificial neural network combining continues variables and entity embeddings. Regarding the binary classification problem (stress vs. no stress), high accuracy (up to 97.4%), for both training approaches (deep-learning, machine learning), was achieved. Regarding the stress detection effectiveness of the created models in another context, such as user experience (UX) evaluation, the results were quite impressive. More specifically, the deep-learning model achieved a rather high agreement when a user-annotated dataset was used for validation.


Sign in / Sign up

Export Citation Format

Share Document