Abstract 4539: Novel feature selection strategies for enhanced predictive modeling and deep learning in the biosciences

Author(s):  
Pengwei Yang ◽  
Ryan Abo ◽  
Chang Liu ◽  
Zehua Chen ◽  
Haiguo Wu ◽  
...  
2018 ◽  
Vol 13 (3) ◽  
pp. 253-259 ◽  
Author(s):  
Long Yu ◽  
Xia Sun ◽  
Shengwei Tian ◽  
Xinyu Shi ◽  
Yilin Yan

Author(s):  
Lindsey M. Kitchell ◽  
Francisco J. Parada ◽  
Brandi L. Emerick ◽  
Tom A. Busey

2021 ◽  
pp. 1-34
Author(s):  
Kadam Vikas Samarthrao ◽  
Vandana M. Rohokale

Email has sustained to be an essential part of our lives and as a means for better communication on the internet. The challenge pertains to the spam emails residing a large amount of space and bandwidth. The defect of state-of-the-art spam filtering methods like misclassification of genuine emails as spam (false positives) is the rising challenge to the internet world. Depending on the classification techniques, literature provides various algorithms for the classification of email spam. This paper tactics to develop a novel spam detection model for improved cybersecurity. The proposed model involves several phases like dataset acquisition, feature extraction, optimal feature selection, and detection. Initially, the benchmark dataset of email is collected that involves both text and image datasets. Next, the feature extraction is performed using two sets of features like text features and visual features. In the text features, Term Frequency-Inverse Document Frequency (TF-IDF) is extracted. For the visual features, color correlogram and Gray-Level Co-occurrence Matrix (GLCM) are determined. Since the length of the extracted feature vector seems to the long, the optimal feature selection process is done. The optimal feature selection is performed by a new meta-heuristic algorithm called Fitness Oriented Levy Improvement-based Dragonfly Algorithm (FLI-DA). Once the optimal features are selected, the detection is performed by the hybrid learning technique that is composed of two deep learning approaches named Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN). For improving the performance of existing deep learning approaches, the number of hidden neurons of RNN and CNN is optimized by the same FLI-DA. Finally, the optimized hybrid learning technique having CNN and RNN classifies the data into spam and ham. The experimental outcomes show the ability of the proposed method to perform the spam email classification based on improved deep learning.


2021 ◽  
Vol 25 (11) ◽  
pp. 6041-6066
Author(s):  
Jiancong Chen ◽  
Baptiste Dafflon ◽  
Anh Phuong Tran ◽  
Nicola Falco ◽  
Susan S. Hubbard

Abstract. Climate change is reshaping vulnerable ecosystems, leading to uncertain effects on ecosystem dynamics, including evapotranspiration (ET) and ecosystem respiration (Reco). However, accurate estimation of ET and Reco still remains challenging at sparsely monitored watersheds, where data and field instrumentation are limited. In this study, we developed a hybrid predictive modeling approach (HPM) that integrates eddy covariance measurements, physically based model simulation results, meteorological forcings, and remote-sensing datasets to estimate ET and Reco in high space–time resolution. HPM relies on a deep learning algorithm and long short-term memory (LSTM) and requires only air temperature, precipitation, radiation, normalized difference vegetation index (NDVI), and soil temperature (when available) as input variables. We tested and validated HPM estimation results in different climate regions and developed four use cases to demonstrate the applicability and variability of HPM at various FLUXNET sites and Rocky Mountain SNOTEL sites in Western North America. To test the limitations and performance of the HPM approach in mountainous watersheds, an expanded use case focused on the East River Watershed, Colorado, USA. The results indicate HPM is capable of identifying complicated interactions among meteorological forcings, ET, and Reco variables, as well as providing reliable estimation of ET and Reco across relevant spatiotemporal scales, even in challenging mountainous systems. The study documents that HPM increases our capability to estimate ET and Reco and enhances process understanding at sparsely monitored watersheds.


Author(s):  
Seonho Kim ◽  
Jungjoon Kim ◽  
Hong-Woo Chun

Interest in research involving health-medical information analysis based on artificial intelligence, especially for deep learning techniques, has recently been increasing. Most of the research in this field has been focused on searching for new knowledge for predicting and diagnosing disease by revealing the relation between disease and various information features of data. These features are extracted by analyzing various clinical pathology data, such as EHR (electronic health records), and academic literature using the techniques of data analysis, natural language processing, etc. However, still needed are more research and interest in applying the latest advanced artificial intelligence-based data analysis technique to bio-signal data, which are continuous physiological records, such as EEG (electroencephalography) and ECG (electrocardiogram). Unlike the other types of data, applying deep learning to bio-signal data, which is in the form of time series of real numbers, has many issues that need to be resolved in preprocessing, learning, and analysis. Such issues include leaving feature selection, learning parts that are black boxes, difficulties in recognizing and identifying effective features, high computational complexities, etc. In this paper, to solve these issues, we provide an encoding-based Wave2vec time series classifier model, which combines signal-processing and deep learning-based natural language processing techniques. To demonstrate its advantages, we provide the results of three experiments conducted with EEG data of the University of California Irvine, which are a real-world benchmark bio-signal dataset. After converting the bio-signals (in the form of waves), which are a real number time series, into a sequence of symbols or a sequence of wavelet patterns that are converted into symbols, through encoding, the proposed model vectorizes the symbols by learning the sequence using deep learning-based natural language processing. The models of each class can be constructed through learning from the vectorized wavelet patterns and training data. The implemented models can be used for prediction and diagnosis of diseases by classifying the new data. The proposed method enhanced data readability and intuition of feature selection and learning processes by converting the time series of real number data into sequences of symbols. In addition, it facilitates intuitive and easy recognition, and identification of influential patterns. Furthermore, real-time large-capacity data analysis is facilitated, which is essential in the development of real-time analysis diagnosis systems, by drastically reducing the complexity of calculation without deterioration of analysis performance by data simplification through the encoding process.


Sign in / Sign up

Export Citation Format

Share Document