scholarly journals Impact of Lexical Features on Answer Detection Model in Discussion Forums

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Atif Khan ◽  
Muhammad Adnan Gul ◽  
Abdullah Alharbi ◽  
M. Irfan Uddin ◽  
Shaukat Ali ◽  
...  

Online forums have become the main source of knowledge over the Internet as data are constantly flooded into them. In most cases, a question in a web forum receives several responses, making it impossible for the question poster to obtain the most suitable answer. Thus, an important problem is how to automatically extract the most appropriate and high-quality answers in a thread. Prior studies have used different combinations of both lexical and nonlexical features to retrieve the most relevant answers from discussion forums, and hence, there is no standard/general set of features that could be effectively used for relevant answer/reply post classification. However, this study proposed an answer detection model that is exclusively relying on lexical features and employs a random forest classifier for classification of answers in discussion boards. Experimental results showed that the proposed answer detection model outperformed the baseline technique and other state-of-the-art machine learning algorithms in terms of classification accuracy on benchmark forum datasets.

Diagnostics ◽  
2020 ◽  
Vol 10 (3) ◽  
pp. 162 ◽  
Author(s):  
Julieta G. Rodríguez-Ruiz ◽  
Carlos E. Galván-Tejada ◽  
Laura A. Zanella-Calzada ◽  
José M. Celaya-Padilla ◽  
Jorge I. Galván-Tejada ◽  
...  

Major Depression Disease has been increasing in the last few years, affecting around 7 percent of the world population, but nowadays techniques to diagnose it are outdated and inefficient. Motor activity data in the last decade is presented as a better way to diagnose, treat and monitor patients suffering from this illness, this is achieved through the use of machine learning algorithms. Disturbances in the circadian rhythm of mental illness patients increase the effectiveness of the data mining process. In this paper, a comparison of motor activity data from the night, day and full day is carried out through a data mining process using the Random Forest classifier to identified depressive and non-depressive episodes. Data from Depressjon dataset is split into three different subsets and 24 features in time and frequency domain are extracted to select the best model to be used in the classification of depression episodes. The results showed that the best dataset and model to realize the classification of depressive episodes is the night motor activity data with 99.37% of sensitivity and 99.91% of specificity.


Cancers ◽  
2022 ◽  
Vol 14 (2) ◽  
pp. 286
Author(s):  
Clément Acquitter ◽  
Lucie Piram ◽  
Umberto Sabatini ◽  
Julia Gilhodes ◽  
Elizabeth Moyal Cohen-Jonathan ◽  
...  

In this study, a radiomics analysis was conducted to provide insights into the differentiation of radionecrosis and tumor progression in multiparametric MRI in the context of a multicentric clinical trial. First, the sensitivity of radiomic features to the unwanted variability caused by different protocol settings was assessed for each modality. Then, the ability of image normalization and ComBat-based harmonization to reduce the scanner-related variability was evaluated. Finally, the performances of several radiomic models dedicated to the classification of MRI examinations were measured. Our results showed that using radiomic models trained on harmonized data achieved better predictive performance for the investigated clinical outcome (balanced accuracy of 0.61 with the model based on raw data and 0.72 with ComBat harmonization). A comparison of several models based on information extracted from different MR modalities showed that the best classification accuracy was achieved with a model based on MR perfusion features in conjunction with clinical observation (balanced accuracy of 0.76 using LASSO feature selection and a Random Forest classifier). Although multimodality did not provide additional benefit in predictive power, the model based on T1-weighted MRI before injection provided an accuracy close to the performance achieved with perfusion.


2020 ◽  
Author(s):  
Alisson Hayasi da Costa ◽  
Renato Augusto C. dos Santos ◽  
Ricardo Cerri

AbstractPIWI-Interacting RNAs (piRNAs) form an important class of non-coding RNAs that play a key role in the genome integrity through the silencing of transposable elements. However, despite their importance and the large application of deep learning in computational biology for classification tasks, there are few studies of deep learning and neural networks for piRNAs prediction. Therefore, this paper presents an investigation on deep feedforward networks models for classification of transposon-derived piRNAs. We analyze and compare the results of the neural networks in different hyperparameters choices, such as number of layers, activation functions and optimizers, clarifying the advantages and disadvantages of each configuration. From this analysis, we propose a model for human piRNAs classification and compare our method with the state-of-the-art deep neural network for piRNA prediction in the literature and also traditional machine learning algorithms, such as Support Vector Machines and Random Forests, showing that our model has achieved a great performance with an F-measure value of 0.872, outperforming the state-of-the-art method in the literature.


2020 ◽  
Vol 10 (2) ◽  
pp. 469 ◽  
Author(s):  
Athanasios Anagnostis ◽  
Gavriela Asiminari ◽  
Elpiniki Papageorgiou ◽  
Dionysis Bochtis

Anthracnose is a fungal disease that infects a large number of trees worldwide, damages intensively the canopy, and spreads with ease to neighboring trees, resulting in the potential destruction of whole crops. Even though it can be treated relatively easily with good sanitation, proper pruning and copper spraying, the main issue is the early detection for the prevention of spreading. Machine learning algorithms can offer the tools for the on-site classification of healthy and affected leaves, as an initial step towards managing such diseases. The purpose of this study was to build a robust convolutional neural network (CNN) model that is able to classify images of leaves, depending on whether or not these are infected by anthracnose, and therefore determine whether a tree is infected. A set of images were used both in grayscale and RGB mode, a fast Fourier transform was implemented for feature extraction, and a CNN architecture was selected based on its performance. Finally, the best performing method was compared with state-of-the-art convolutional neural network architectures.


Author(s):  
Weiwei Yang ◽  
Haifeng Song

Recent research has shown that integration of spatial information has emerged as a powerful tool in improving the classification accuracy of hyperspectral image (HSI). However, partitioning homogeneous regions of the HSI remains a challenging task. This paper proposes a novel spectral-spatial classification method inspired by the support vector machine (SVM). The model consists of spectral-spatial feature extraction channel (SSC) and SVM classifier. SSC is mainly used to extract spatial-spectral features of HSI. SVM is mainly used to classify the extracted features. The model can automatically extract the features of HSI and classify them. Experiments are conducted on benchmark HSI dataset (Indian Pines). It is found that the proposed method yields more accurate classification results compared to the state-of-the-art techniques.


2020 ◽  
Vol 10 (8) ◽  
pp. 2908 ◽  
Author(s):  
Juan Luján-García ◽  
Cornelio Yáñez-Márquez ◽  
Yenny Villuendas-Rey ◽  
Oscar Camacho-Nieto

Pneumonia is an infectious disease that affects the lungs and is one of the principal causes of death in children under five years old. The Chest X-ray images technique is one of the most used for diagnosing pneumonia. Several Machine Learning algorithms have been successfully used in order to provide computer-aided diagnosis by automatic classification of medical images. For its remarkable results, the Convolutional Neural Networks (models based on Deep Learning) that are widely used in Computer Vision tasks, such as classification of injuries and brain abnormalities, among others, stand out. In this paper, we present a transfer learning method that automatically classifies between 3883 chest X-ray images characterized as depicting pneumonia and 1349 labeled as normal. The proposed method uses the Xception Network pre-trained weights on ImageNet as an initialization. Our model is competitive with respect to state-of-the-art proposals. To make comparisons with other models, we have used four well-known performance measures, obtaining the following results: precision (0.84), recall (0.99), F1-score (0.91) and area under the ROC curve (0.97). These positive results allow us to consider our proposal as an alternative that can be useful in countries with a lack of equipment and specialized radiologists.


2020 ◽  
Vol 10 (17) ◽  
pp. 5956
Author(s):  
Sławomir K. Zieliński ◽  
Hyunkook Lee ◽  
Paweł Antoniuk ◽  
Oskar Dadan

The purpose of this paper is to compare the performance of human listeners against the selected machine learning algorithms in the task of the classification of spatial audio scenes in binaural recordings of music under practical conditions. The three scenes were subject to classification: (1) music ensemble (a group of musical sources) located in the front, (2) music ensemble located at the back, and (3) music ensemble distributed around a listener. In the listening test, undertaken remotely over the Internet, human listeners reached the classification accuracy of 42.5%. For the listeners who passed the post-screening test, the accuracy was greater, approaching 60%. The above classification task was also undertaken automatically using four machine learning algorithms: convolutional neural network, support vector machines, extreme gradient boosting framework, and logistic regression. The machine learning algorithms substantially outperformed human listeners, with the classification accuracy reaching 84%, when tested under the binaural-room-impulse-response (BRIR) matched conditions. However, when the algorithms were tested under the BRIR mismatched scenario, the accuracy obtained by the algorithms was comparable to that exhibited by the listeners who passed the post-screening test, implying that the machine learning algorithms capability to perform in unknown electro-acoustic conditions needs to be further improved.


Sensors ◽  
2021 ◽  
Vol 21 (24) ◽  
pp. 8174
Author(s):  
Sandra Śmigiel ◽  
Krzysztof Pałczyński ◽  
Damian Ledziński

Deep Neural Networks (DNNs) are state-of-the-art machine learning algorithms, the application of which in electrocardiographic signals is gaining importance. So far, limited studies or optimizations using DNN can be found using ECG databases. To explore and achieve effective ECG recognition, this paper presents a convolutional neural network to perform the encoding of a single QRS complex with the addition of entropy-based features. This study aims to determine what combination of signal information provides the best result for classification purposes. The analyzed information included the raw ECG signal, entropy-based features computed from raw ECG signals, extracted QRS complexes, and entropy-based features computed from extracted QRS complexes. The tests were based on the classification of 2, 5, and 20 classes of heart diseases. The research was carried out on the data contained in a PTB-XL database. An innovative method of extracting QRS complexes based on the aggregation of results from established algorithms for multi-lead signals using the k-mean method, at the same time, was presented. The obtained results prove that adding entropy-based features and extracted QRS complexes to the raw signal is beneficial. Raw signals with entropy-based features but without extracted QRS complexes performed much worse.


2022 ◽  
Vol 9 ◽  
Author(s):  
Suleman Khan ◽  
Saqib Hakak ◽  
N. Deepa ◽  
B. Prabadevi ◽  
Kapal Dev ◽  
...  

Since its emergence in December 2019, there have been numerous posts and news regarding the COVID-19 pandemic in social media, traditional print, and electronic media. These sources have information from both trusted and non-trusted medical sources. Furthermore, the news from these media are spread rapidly. Spreading a piece of deceptive information may lead to anxiety, unwanted exposure to medical remedies, tricks for digital marketing, and may lead to deadly factors. Therefore, a model for detecting fake news from the news pool is essential. In this work, the dataset which is a fusion of news related to COVID-19 that has been sourced from data from several social media and news sources is used for classification. In the first step, preprocessing is performed on the dataset to remove unwanted text, then tokenization is carried out to extract the tokens from the raw text data collected from various sources. Later, feature selection is performed to avoid the computational overhead incurred in processing all the features in the dataset. The linguistic and sentiment features are extracted for further processing. Finally, several state-of-the-art machine learning algorithms are trained to classify the COVID-19-related dataset. These algorithms are then evaluated using various metrics. The results show that the random forest classifier outperforms the other classifiers with an accuracy of 88.50%.


Author(s):  
Syed Ahsin Ali Shah ◽  
Nazneen Habib ◽  
Wajid Aziz ◽  
Ehsan Ullah Khan ◽  
Malik Sajjad Ahmed Nadeem

Background: The medical researchers are developing different non-invasive methods for early detection of Neurodegenerative Diseases (NDDs) when pharmacological interventions are still possible to further prevent the disease progression. The NDDs are associated with the degradation in the complex gait dynamics and motor activity. The classification of gait data using machine learning techniques can assist the physicians for early diagnosis of the neural disorder when clinical manifestation of the diseases is not yet apparent. Aims: The present study was undertaken to classify the control and NDD subjects using decision trees based classifiers (Random Forest (RF), J48 and REPTree). Methodology: The data used in the study comprises of 16 control, 20 Huntington’s Disease (HD), 15 Parkinson’s Disease (PD), and 13 Amyotrophic Lateral Sclerosis (ALS) subjects, which were taken from publicly available database from Physionet. The age range of control subjects was 20-74, HD subjects was 36-70, PD subjects was 44-80, and ALS subjects was 29-71. There were 13 attributes associated with the data. Important features/attributes of the data were selected using correlation feature selection - subset evaluation (cfs) method. Three tree based machine learning algorithms (RF, J48 and REPTree) were used to classify the control and NDD subjects. The performance of classifiers were evaluated using Precision, Recall, F-Measure, MAE and RMSE. Results: In order to evaluate the performance of tree based classifiers, two different settings of data i.e. complete features and selected features were used. In classifying control vs HD subjects, RF provides the robust separation with classification accuracy of 84.79% using complete features and 83.94% using selected features. While in classifying control vs PD subjects, and control vs ALS subjects, RF also provides the best separation with classification accuracy of 86.51% and 94.95% respectively using complete features and 85.19% and 93.64% respectively using selected features. Conclusion: The variability analysis of physiological signals provides a valuable non-invasive tool for quantifying the system of dynamics of healthy subjects and to examine the alternations in the controlling mechanism of these systems with aging and disease. It is concluded that selected features encode adequate information about neural control of the gait. Moreover, the selected features along with tree based machine learning algorithms can play a vital for early detection of NDDs, when pharmacological interventions are still possible.


Sign in / Sign up

Export Citation Format

Share Document