feature sets
Recently Published Documents


TOTAL DOCUMENTS

527
(FIVE YEARS 193)

H-INDEX

31
(FIVE YEARS 7)

Author(s):  
José Antonio García-Díaz ◽  
Rafael Valencia-García

AbstractSatirical content on social media is hard to distinguish from real news, misinformation, hoaxes or propaganda when there are no clues as to which medium these news were originally written in. It is important, therefore, to provide Information Retrieval systems with mechanisms to identify which results are legitimate and which ones are misleading. Our contribution for satire identification is twofold. On the one hand, we release the Spanish SatiCorpus 2021, a balanced dataset that contains satirical and non-satirical documents. On the other hand, we conduct an extensive evaluation of this dataset with linguistic features and embedding-based features. All feature sets are evaluated separately and combined using different strategies. Our best result is achieved with a combination of the linguistic features and BERT with an accuracy of 97.405%. Besides, we compare our proposal with existing datasets in Spanish regarding satire and irony.


2022 ◽  
Vol 15 (1) ◽  
pp. 1-23
Author(s):  
Rizwan Ur Rahman ◽  
Lokesh Yadav ◽  
Deepak Singh Tomar

Phishing attack is a deceitful attempt to steal the confidential data such as credit card information, and account passwords. In this paper, Phish-Shelter, a novel anti-phishing browser is developed, which analyzes the URL and the content of phishing page. Phish-Shelter is based on combined supervised machine learning model.Phish-Shelter browser uses two novel feature set, which are used to determine the web page identity. The proposed feature sets include eight features to evaluate the obfuscation-based rule, and eight features to identify search engine. Further, we have taken eleven features which are used to discover contents, and blacklist based rule. Phish-Shelter exploited matching identity features, which determines the degree of similarity of a URL with the blacklisted URLs. Proposed features are independent from third-party services such as web browser history or search engines result. The experimental results indicate that, there is a significant improvement in detection accuracy using proposed features over traditional features.


2021 ◽  
Author(s):  
Hryhorii Chereda ◽  
Andreas Leha ◽  
Tim Beissbarth

Motivation: High-throughput technologies play a more and more significant role in discovering prognostic molecular signatures and identifying novel drug targets. It is common to apply Machine Learning (ML) methods to classify high-dimensional gene expression data and to determine a subset of features (genes) that is important for decisions of a ML model. One feature subset of important genes corresponds to one dataset and it is essential to sustain the stability of feature sets across different datasets with the same clinical endpoint since the selected genes are candidates for prognostic biomarkers. The stability of feature selection can be improved by including information of molecular networks into ML methods. Gene expression data can be assigned to the vertices of a molecular network's graph and then classified by a Graph Convolutional Neural Network (GCNN). GCNN is a contemporary deep learning approach that can be applied to graph-structured data. Layer-wise Relevance Propagation (LRP) is a technique to explain decisions of deep learning methods. In our recent work we developed Graph Layer-wise Relevance Propagation (GLRP) --- a method that adapts LRP to a graph convolution and explains patient-specific decisions of GCNN. GLRP delivers individual molecular signatures as patient-specific subnetworks that are parts of a molecular network representing background knowledge about biological mechanisms. GLRP gives a possibility to deliver the subset of features corresponding to a dataset as well, so that the stability of feature selection performed by GLRP can be measured and compared to that of other methods. Results: Utilizing two large breast cancer datasets, we analysed properties of feature sets selected by GLRP (GCNN+LRP) such as stability and permutation importance. We have implemented a graph convolutional layer of GCNN as a Keras layer so that the SHAP (SHapley Additive exPlanation) explanation method could be also applied to a Keras version of a GCNN model. We compare the stability of feature selection performed by GCNN+LRP to the stability of GCNN+SHAP and to other ML based feature selection methods. We conclude, that GCNN+LRP shows the highest stability among other feature selection methods including GCNN+SHAP. It was established that the permutation importance of features among GLRP subnetworks is lower than among GCNN+SHAP subnetworks, but in the context of the utilized molecular network, a GLRP subnetwork of an individual patient is on average substantially more connected (and interpretable) than a GCNN+SHAP subnetwork, which consists mainly of single vertices.


Author(s):  
Paul Ntim Yeboah ◽  
Stephen Kweku Amuquandoh ◽  
Haruna Balle Baz Musah

Conventional approaches to tackling malware attacks have proven to be futile at detecting never-before-seen (zero-day) malware. Research however has shown that zero-day malicious files are mostly semantic-preserving variants of already existing malware, which are generated via obfuscation methods. In this paper we propose and evaluate a machine learning based malware detection model using ensemble approach. We employ a strategy of ensemble where multiple feature sets generated from different n-gram sizes of opcode sequences are trained using a single classifier. Model predictions on the trained multi feature sets are weighted and combined on average to make a final verdict on whether a binary file is malicious or benign. To obtain optimal weight combination for the ensemble feature sets, we applied a grid search on a set of pre-defined weights in the range 0 to 1. With a balanced dataset of 2000 samples, an ensemble of n-gram opcode sequences of n sizes 1 and 2 with respective weight pair 0.3 and 0.7 yielded the best detection accuracy of 98.1% using random forest (RF) classifier. Ensemble n-gram sizes 2 and 3 obtained 99.7% as best precision using weight 0.5 for both models.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Ashish Sharma ◽  
Dhirendra P. Yadav ◽  
Hitendra Garg ◽  
Mukesh Kumar ◽  
Bhisham Sharma ◽  
...  

Bone cancer is considered a serious health problem, and, in many cases, it causes patient death. The X-ray, MRI, or CT-scan image is used by doctors to identify bone cancer. The manual process is time-consuming and required expertise in that field. Therefore, it is necessary to develop an automated system to classify and identify the cancerous bone and the healthy bone. The texture of a cancer bone is different compared to a healthy bone in the affected region. But in the dataset, several images of cancer and healthy bone are having similar morphological characteristics. This makes it difficult to categorize them. To tackle this problem, we first find the best suitable edge detection algorithm after that two feature sets one with hog and another without hog are prepared. To test the efficiency of these feature sets, two machine learning models, support vector machine (SVM) and the Random forest, are utilized. The features set with hog perform considerably better on these models. Also, the SVM model trained with hog feature set provides an F 1 -score of 0.92 better than Random forest F 1 -score 0.77.


Processes ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 2286
Author(s):  
Ammar Amjad ◽  
Lal Khan ◽  
Hsien-Tsung Chang

Recently, identifying speech emotions in a spontaneous database has been a complex and demanding study area. This research presents an entirely new approach for recognizing semi-natural and spontaneous speech emotions with multiple feature fusion and deep neural networks (DNN). A proposed framework extracts the most discriminative features from hybrid acoustic feature sets. However, these feature sets may contain duplicate and irrelevant information, leading to inadequate emotional identification. Therefore, an support vector machine (SVM) algorithm is utilized to identify the most discriminative audio feature map after obtaining the relevant features learned by the fusion approach. We investigated our approach utilizing the eNTERFACE05 and BAUM-1s benchmark databases and observed a significant identification accuracy of 76% for a speaker-independent experiment with SVM and 59% accuracy with, respectively. Furthermore, experiments on the eNTERFACE05 and BAUM-1s dataset indicate that the suggested framework outperformed current state-of-the-art techniques on the semi-natural and spontaneous datasets.


The survival percentage of pulmonary sufferers can be improved if pneumonia is detected in time. Imaging of the chest x-Ray is the most common way of finding as well as identifying pneumonia. A competent radiologist poses a severe problem while identifying pneumonia using CXR scans. To maximize classification precision, it requires an autonomous computer-aided detection approach. Designing a lightweight autonomous pneumonia detection mechanism for resource-efficient healthcare devices is critical for enhancing healthcare quality while lowering expenses and increasing reaction time. In this proposed work, a machine learning-based hybridization approach is implemented for the identification of pneumonia in the chest x-Ray scans. The proposed methodology is divided into different segments: the 1st segment is to remove noise from the chest x-Ray scans (pre-processing). After the pre-processing of CXR scans, the second module is to extract features from the pre-processed scans. The scale-invariant feature transform (SIFT) method is implemented for the extraction of essential features. This CIO-MSVM (Crossbreed Invariant Optimization-MSVM) method will select the valuable feature with the help of FF (fitness function). This function will help to select the feature matrix and then implement the MSVM algorithm. It will pass the instance selected feature set to the train model and test model. It will classify the feature sets. If feature sets will match then detect or classify the Chest X-ray image and evaluate the performance metrics such as accuracy, spec, sens., etc and compared with the existing methods.


2021 ◽  
Vol 8 (11) ◽  
pp. 352-369
Author(s):  
Nhung Nguyen ◽  
Dien Dinh

Recently, there has been renewed interest in stylometry, a branch of forensic linguistics evaluating linguistic features which affect an author’s writing style. However, no known empirical research has attempted to explore relationships between word-level features in Vietnamese texts and writing style, both across genders and individual authors. Using two series of correspondence analysis, the current study thus seeks to analyze the most significant linguistic features across genders and individual authors based on a rich-annotated specialized Vietnamese corpus. In terms of genders, the most significant associations on writing style were identified for the combination of personal pronouns and negative words, whereas the seperated feature sets have less discriminating ability. For individual authors, negation words demonstrate their significant associations, and personal pronouns again have insignificant relationships with individual writing style. As a result of these investigations, suggestions were identified for future research.


2021 ◽  
Vol 11 (12) ◽  
pp. 2918-2927
Author(s):  
A. Shankar ◽  
S. Muttan ◽  
D. Vaithiyanathan

Brain Computer Interface (BCI) is a fast growing area of research to enable communication between our brains and computers. EEG based motor imagery BCI involves the user imagining movement, the subsequent recording and signal processing on the electroencephalogram signals from the brain, and the translation of those signals into specific commands. Ultimately, motor imagery BCI has the potential to be applied to helping those with special abilities recover motor control. This paper presents an evaluation of performance for EEG based motor imagery BCI with a classification accuracy of 80.2%, making use of features extracted using the Fast Fourier Transform and the Discrete Wavelet Transform, and classification is done using an Artificial Neural Network. It goes on to conclude how the performance is affected by the particular feature sets and neural network parameters.


2021 ◽  
pp. 073346482110538
Author(s):  
Shannon T. Mejía ◽  
Tai-Te Su ◽  
Qingyi Lan ◽  
Ajiang Zou ◽  
Aileen Griffin ◽  
...  

Falls are not only a leading cause of death and disability, but also a strain on the capacity for caregivers to provide care. This study examined how the context of caregiving relates to the importance of caregiver-defined mobile fall prevention feature sets. A sample of 266 family caregivers, recruited from a Chinese social media platform, reported care for an older adult and interest in mobile fall prevention technology features. Factor analysis identified three caregiver-defined feature sets: automatic fall response, digitized fall prevention tools, and social features. Multiple regression showed caregivers’ concern about falling was the most robust predictor of a feature set’s importance. Poisson regression revealed that caregiver concern and assistance with instrumental activities of daily living were associated with rating more features as important. Our findings suggest that caregivers are interested in mobile fall prevention technologies that support older adults’ independence while also alleviating concerns about falling.


Sign in / Sign up

Export Citation Format

Share Document