Abstract 4539: Novel feature selection strategies for enhanced predictive modeling and deep learning in the biosciences

Email has sustained to be an essential part of our lives and as a means for better communication on the internet. The challenge pertains to the spam emails residing a large amount of space and bandwidth. The defect of state-of-the-art spam filtering methods like misclassification of genuine emails as spam (false positives) is the rising challenge to the internet world. Depending on the classification techniques, literature provides various algorithms for the classification of email spam. This paper tactics to develop a novel spam detection model for improved cybersecurity. The proposed model involves several phases like dataset acquisition, feature extraction, optimal feature selection, and detection. Initially, the benchmark dataset of email is collected that involves both text and image datasets. Next, the feature extraction is performed using two sets of features like text features and visual features. In the text features, Term Frequency-Inverse Document Frequency (TF-IDF) is extracted. For the visual features, color correlogram and Gray-Level Co-occurrence Matrix (GLCM) are determined. Since the length of the extracted feature vector seems to the long, the optimal feature selection process is done. The optimal feature selection is performed by a new meta-heuristic algorithm called Fitness Oriented Levy Improvement-based Dragonfly Algorithm (FLI-DA). Once the optimal features are selected, the detection is performed by the hybrid learning technique that is composed of two deep learning approaches named Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN). For improving the performance of existing deep learning approaches, the number of hidden neurons of RNN and CNN is optimized by the same FLI-DA. Finally, the optimized hybrid learning technique having CNN and RNN classifies the data into spam and ham. The experimental outcomes show the ability of the proposed method to perform the spam email classification based on improved deep learning.

Download Full-text

Forecasting air pollutant concentration using a novel spatiotemporal deep learning model based on clustering, feature selection and empirical wavelet transform

The Science of The Total Environment ◽

10.1016/j.scitotenv.2021.149654 ◽

2021 ◽

pp. 149654

Author(s):

Jusong Kim ◽

Xiaoli Wang ◽

Chollyong Kang ◽

Jinwon Yu ◽

Penghui Li

Keyword(s):

Feature Selection ◽

Deep Learning ◽

Wavelet Transform ◽

Learning Model ◽

Air Pollutant ◽

Pollutant Concentration ◽

Model Based ◽

Empirical Wavelet Transform ◽

Deep Learning Model

Download Full-text

Feature Selection and Deep Learning for Deterioration Prediction of the Bridges

Journal of Performance of Constructed Facilities ◽

10.1061/(asce)cf.1943-5509.0001653 ◽

2021 ◽

Vol 35 (6) ◽

pp. 04021078

Author(s):

Jinsong Zhu ◽

Yanlei Wang

Keyword(s):

Feature Selection ◽

Deep Learning

Download Full-text

Comparison of feature selection strategies for hearing impairments diagnostics

Proceedings of 15th IEEE Symposium on Computer-Based Medical Systems (CBMS 2002) ◽

10.1109/cbms.2002.1011382 ◽

2003 ◽

Cited By ~ 1

Author(s):

I. Skrypnyk

Keyword(s):

Feature Selection ◽

Hearing Impairments ◽

Selection Strategies

Download Full-text

A deep learning hybrid predictive modeling (HPM) approach for estimating evapotranspiration and ecosystem respiration

Hydrology and Earth System Sciences ◽

10.5194/hess-25-6041-2021 ◽

2021 ◽

Vol 25 (11) ◽

pp. 6041-6066

Author(s):

Jiancong Chen ◽

Baptiste Dafflon ◽

Anh Phuong Tran ◽

Nicola Falco ◽

Susan S. Hubbard

Keyword(s):

Deep Learning ◽

Predictive Modeling ◽

Vegetation Index ◽

Short Term Memory ◽

Learning Algorithm ◽

Model Simulation ◽

Ecosystem Respiration ◽

Ecosystem Dynamics ◽

Accurate Estimation ◽

East River

Abstract. Climate change is reshaping vulnerable ecosystems, leading to uncertain effects on ecosystem dynamics, including evapotranspiration (ET) and ecosystem respiration (Reco). However, accurate estimation of ET and Reco still remains challenging at sparsely monitored watersheds, where data and field instrumentation are limited. In this study, we developed a hybrid predictive modeling approach (HPM) that integrates eddy covariance measurements, physically based model simulation results, meteorological forcings, and remote-sensing datasets to estimate ET and Reco in high space–time resolution. HPM relies on a deep learning algorithm and long short-term memory (LSTM) and requires only air temperature, precipitation, radiation, normalized difference vegetation index (NDVI), and soil temperature (when available) as input variables. We tested and validated HPM estimation results in different climate regions and developed four use cases to demonstrate the applicability and variability of HPM at various FLUXNET sites and Rocky Mountain SNOTEL sites in Western North America. To test the limitations and performance of the HPM approach in mountainous watersheds, an expanded use case focused on the East River Watershed, Colorado, USA. The results indicate HPM is capable of identifying complicated interactions among meteorological forcings, ET, and Reco variables, as well as providing reliable estimation of ET and Reco across relevant spatiotemporal scales, even in challenging mountainous systems. The study documents that HPM increases our capability to estimate ET and Reco and enhances process understanding at sparsely monitored watersheds.

Download Full-text

Wave2Vec: Vectorizing Electroencephalography Bio-Signal for Prediction of Brain Disease

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph15081750 ◽

2018 ◽

Vol 15 (8) ◽

pp. 1750 ◽

Cited By ~ 4

Author(s):

Seonho Kim ◽

Jungjoon Kim ◽

Hong-Woo Chun

Keyword(s):

Artificial Intelligence ◽

Time Series ◽

Feature Selection ◽

Deep Learning ◽

Natural Language Processing ◽

Data Analysis ◽

Natural Language ◽

Real Number ◽

Real Time ◽

Language Processing

Interest in research involving health-medical information analysis based on artificial intelligence, especially for deep learning techniques, has recently been increasing. Most of the research in this field has been focused on searching for new knowledge for predicting and diagnosing disease by revealing the relation between disease and various information features of data. These features are extracted by analyzing various clinical pathology data, such as EHR (electronic health records), and academic literature using the techniques of data analysis, natural language processing, etc. However, still needed are more research and interest in applying the latest advanced artificial intelligence-based data analysis technique to bio-signal data, which are continuous physiological records, such as EEG (electroencephalography) and ECG (electrocardiogram). Unlike the other types of data, applying deep learning to bio-signal data, which is in the form of time series of real numbers, has many issues that need to be resolved in preprocessing, learning, and analysis. Such issues include leaving feature selection, learning parts that are black boxes, difficulties in recognizing and identifying effective features, high computational complexities, etc. In this paper, to solve these issues, we provide an encoding-based Wave2vec time series classifier model, which combines signal-processing and deep learning-based natural language processing techniques. To demonstrate its advantages, we provide the results of three experiments conducted with EEG data of the University of California Irvine, which are a real-world benchmark bio-signal dataset. After converting the bio-signals (in the form of waves), which are a real number time series, into a sequence of symbols or a sequence of wavelet patterns that are converted into symbols, through encoding, the proposed model vectorizes the symbols by learning the sequence using deep learning-based natural language processing. The models of each class can be constructed through learning from the vectorized wavelet patterns and training data. The implemented models can be used for prediction and diagnosis of diseases by classifying the new data. The proposed method enhanced data readability and intuition of feature selection and learning processes by converting the time series of real number data into sequences of symbols. In addition, it facilitates intuitive and easy recognition, and identification of influential patterns. Furthermore, real-time large-capacity data analysis is facilitated, which is essential in the development of real-time analysis diagnosis systems, by drastically reducing the complexity of calculation without deterioration of analysis performance by data simplification through the encoding process.

Download Full-text

Abstract 4539: Novel feature selection strategies for enhanced predictive modeling and deep learning in the biosciences

Drug and Nondrug Classification Based on Deep Learning with Various Feature Selection Strategies

Feature Selection Strategies and Perceptual Expertise in Configuration Search Tasks

A novel deep learning-based feature selection model for improving the static analysis of vulnerability detection

Physics-guided deep learning framework for predictive modeling of bridge vortex-induced vibrations from field monitoring

Enhancement of email spam detection using improved deep learning algorithms for cyber security

Forecasting air pollutant concentration using a novel spatiotemporal deep learning model based on clustering, feature selection and empirical wavelet transform

Feature Selection and Deep Learning for Deterioration Prediction of the Bridges

Comparison of feature selection strategies for hearing impairments diagnostics

A deep learning hybrid predictive modeling (HPM) approach for estimating evapotranspiration and ecosystem respiration

Wave2Vec: Vectorizing Electroencephalography Bio-Signal for Prediction of Brain Disease

Export Citation Format