scholarly journals A Phishing Webpage Detection Method Based on Stacked Autoencoder and Correlation Coefficients

2019 ◽  
Vol 27 (2) ◽  
pp. 41-54 ◽  

Phishing is a kind of cyber-attack that targets naive online users by tricking them into revealing sensitive information. There are many anti-phishing solutions proposed to date, such as blacklist or whitelist, heuristic-based and machine learning-based methods. However, online users are still being trapped into revealing sensitive information in phishing websites. In this paper, we propose a novel phishing webpage detection model, based on features that are extracted from URL, source codes of HTML, and the third-party services to represent the basic characters of phishing webpages, which uses a deep learning method – Stacked Autoencoder (SAE) to detect phishing webpages. To make features in the same order of magnitude, three kinds of normalization methods are adopted. In particular, a method to calculate correlation coefficients between weight matrixes of SAE is proposed to determine optimal width of hidden layers, which shows high computational efficiency and feasibility. Based on the testing of a set of phishing and benign webpages, the model using SAE achieves the best performance when compared to other algorithms such as Naive Bayes (NB), Support Vector Machine (SVM), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN). It indicates that the proposed detection model is promising and can be applied effectively to phishing detection.

The main aim of the proposed work is to generate an accurate automated seizure detection model for the performance evaluation of the improvement on epileptic patients in an improved manner. Long data sets of EEG signals are recorded for a long duration of time which has taken from PhysioNet CHB-MIT EEG datset for this experimental work. Six types of elements are excerpted from EEG signals by using WPT method and which is then classified by using CFS method. Then, all the features are combinely inputted to the rule based twin- support vector machines (TSVMs ) to detect normal, ictal and pre-ictal EEG segments. The developed seizure detection WPT-KWMTSVM method achieved excellent performance with the average Accuracy, specificity, sensitivity, G-mean, positive predictive value, and Mathews correlation coefficients are 97.14%, 97.33%, 97.00%, 97.31%, 96.85%, 95.96% respectively The average area under curve (AUC) is approximately 1. The proposed method is able to enhance the seizure detection outcomes for proper clinical diagnosis in medical applications.


The main aim of the proposed work is to generate an accurate automated seizure detection model for the performance evaluation of the improvement on epileptic patients in an improved manner. Long data sets of EEG signals are recorded for a long duration of time which has taken from PhysioNet CHB-MIT EEG datset for this experimental work. Six types of elements are excerpted from EEG signals by using WPT method and which is then classified by using CFS method. Then, all the features are combinely inputted to the rule based twin- support vector machines (TSVMs ) to detect normal, ictal and pre-ictal EEG segments. The developed seizure detection WPT-KWMTSVM method achieved excellent performance with the average Accuracy, specificity, sensitivity, G-mean, positive predictive value, and Mathews correlation coefficients are 97.14%, 97.33%, 97.00%, 97.31%, 96.85%, 95.96% respectively The average area under curve (AUC) is approximately 1. The proposed method is able to enhance the seizure detection outcomes for proper clinical diagnosis in medical applications.


Author(s):  
Bhargavi Munnaluri ◽  
K. Ganesh Reddy

Wind forecasting is one of the best efficient ways to deal with the challenges of wind power generation. Due to the depletion of fossil fuels renewable energy sources plays a major role for the generation of power. For future management and for future utilization of power, we need to predict the wind speed.  In this paper, an efficient hybrid forecasting approach with the combination of Support Vector Machine (SVM) and Artificial Neural Networks(ANN) are proposed to improve the quality of prediction of wind speed. Due to the different parameters of wind, it is difficult to find the accurate prediction value of the wind speed. The proposed hybrid model of forecasting is examined by taking the hourly wind speed of past years data by reducing the prediction error with the help of Mean Square Error by 0.019. The result obtained from the Artificial Neural Networks improves the forecasting quality.


2021 ◽  
Vol 13 (15) ◽  
pp. 3024
Author(s):  
Huiqin Ma ◽  
Wenjiang Huang ◽  
Yingying Dong ◽  
Linyi Liu ◽  
Anting Guo

Fusarium head blight (FHB) is a major winter wheat disease in China. The accurate and timely detection of wheat FHB is vital to scientific field management. By combining three types of spectral features, namely, spectral bands (SBs), vegetation indices (VIs), and wavelet features (WFs), in this study, we explore the potential of using hyperspectral imagery obtained from an unmanned aerial vehicle (UAV), to detect wheat FHB. First, during the wheat filling period, two UAV-based hyperspectral images were acquired. SBs, VIs, and WFs that were sensitive to wheat FHB were extracted and optimized from the two images. Subsequently, a field-scale wheat FHB detection model was formulated, based on the optimal spectral feature combination of SBs, VIs, and WFs (SBs + VIs + WFs), using a support vector machine. Two commonly used data normalization algorithms were utilized before the construction of the model. The single WFs, and the spectral feature combination of optimal SBs and VIs (SBs + VIs), were respectively used to formulate models for comparison and testing. The results showed that the detection model based on the normalized SBs + VIs + WFs, using min–max normalization algorithm, achieved the highest R2 of 0.88 and the lowest RMSE of 2.68% among the three models. Our results suggest that UAV-based hyperspectral imaging technology is promising for the field-scale detection of wheat FHB. Combining traditional SBs and VIs with WFs can improve the detection accuracy of wheat FHB effectively.


Biomolecules ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 500
Author(s):  
László Keresztes ◽  
Evelin Szögi ◽  
Bálint Varga ◽  
Viktor Farkas ◽  
András Perczel ◽  
...  

The amyloid state of proteins is widely studied with relevance to neurology, biochemistry, and biotechnology. In contrast with nearly amorphous aggregation, the amyloid state has a well-defined structure, consisting of parallel and antiparallel β-sheets in a periodically repeated formation. The understanding of the amyloid state is growing with the development of novel molecular imaging tools, like cryogenic electron microscopy. Sequence-based amyloid predictors were developed, mainly using artificial neural networks (ANNs) as the underlying computational technique. From a good neural-network-based predictor, it is a very difficult task to identify the attributes of the input amino acid sequence, which imply the decision of the network. Here, we present a linear Support Vector Machine (SVM)-based predictor for hexapeptides with correctness higher than 84%, i.e., it is at least as good as the best published ANN-based tools. Unlike artificial neural networks, the decisions of the linear SVMs are much easier to analyze and, from a good predictor, we can infer rich biochemical knowledge. In the Budapest Amyloid Predictor webserver the user needs to input a hexapeptide, and the server outputs a prediction for the input plus the 6 × 19 = 114 distance-1 neighbors of the input hexapeptide.


Data ◽  
2021 ◽  
Vol 6 (8) ◽  
pp. 87
Author(s):  
Sara Ferreira ◽  
Mário Antunes ◽  
Manuel E. Correia

Deepfake and manipulated digital photos and videos are being increasingly used in a myriad of cybercrimes. Ransomware, the dissemination of fake news, and digital kidnapping-related crimes are the most recurrent, in which tampered multimedia content has been the primordial disseminating vehicle. Digital forensic analysis tools are being widely used by criminal investigations to automate the identification of digital evidence in seized electronic equipment. The number of files to be processed and the complexity of the crimes under analysis have highlighted the need to employ efficient digital forensics techniques grounded on state-of-the-art technologies. Machine Learning (ML) researchers have been challenged to apply techniques and methods to improve the automatic detection of manipulated multimedia content. However, the implementation of such methods have not yet been massively incorporated into digital forensic tools, mostly due to the lack of realistic and well-structured datasets of photos and videos. The diversity and richness of the datasets are crucial to benchmark the ML models and to evaluate their appropriateness to be applied in real-world digital forensics applications. An example is the development of third-party modules for the widely used Autopsy digital forensic application. This paper presents a dataset obtained by extracting a set of simple features from genuine and manipulated photos and videos, which are part of state-of-the-art existing datasets. The resulting dataset is balanced, and each entry comprises a label and a vector of numeric values corresponding to the features extracted through a Discrete Fourier Transform (DFT). The dataset is available in a GitHub repository, and the total amount of photos and video frames is 40,588 and 12,400, respectively. The dataset was validated and benchmarked with deep learning Convolutional Neural Networks (CNN) and Support Vector Machines (SVM) methods; however, a plethora of other existing ones can be applied. Generically, the results show a better F1-score for CNN when comparing with SVM, both for photos and videos processing. CNN achieved an F1-score of 0.9968 and 0.8415 for photos and videos, respectively. Regarding SVM, the results obtained with 5-fold cross-validation are 0.9953 and 0.7955, respectively, for photos and videos processing. A set of methods written in Python is available for the researchers, namely to preprocess and extract the features from the original photos and videos files and to build the training and testing sets. Additional methods are also available to convert the original PKL files into CSV and TXT, which gives more flexibility for the ML researchers to use the dataset on existing ML frameworks and tools.


2021 ◽  
Vol 13 (8) ◽  
pp. 1409
Author(s):  
Kun Song ◽  
Xichuan Liu ◽  
Taichang Gao ◽  
Peng Zhang

Water vapor is a key element in both the greenhouse effect and the water cycle. However, water vapor has not been well studied due to the limitations of conventional monitoring instruments. Recently, estimating rain rate by the rain-induced attenuation of commercial microwave links (MLs) has been proven to be a feasible method. Similar to rainfall, water vapor also attenuates the energy of MLs. Thus, MLs also have the potential of estimating water vapor. This study proposes a method to estimate water vapor density by using the received signal level (RSL) of MLs at 15, 18, and 23 GHz, which is the first attempt to estimate water vapor by MLs below 20 GHz. This method trains a sensing model with prior RSL data and water vapor density by the support vector machine, and the model can directly estimate the water vapor density from the RSLs without preprocessing. The results show that the measurement resolution of the proposed method is less than 1 g/m3. The correlation coefficients between automatic weather stations and MLs range from 0.72 to 0.81, and the root mean square errors range from 1.57 to 2.31 g/m3. With the large availability of signal measurements from communications operators, this method has the potential of providing refined data on water vapor density, which can contribute to research on the atmospheric boundary layer and numerical weather forecasting.


SLEEP ◽  
2021 ◽  
Vol 44 (Supplement_2) ◽  
pp. A164-A164
Author(s):  
Pahnwat Taweesedt ◽  
JungYoon Kim ◽  
Jaehyun Park ◽  
Jangwoon Park ◽  
Munish Sharma ◽  
...  

Abstract Introduction Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder with an estimation of one billion people. Full-night polysomnography is considered the gold standard for OSA diagnosis. However, it is time-consuming, expensive and is not readily available in many parts of the world. Many screening questionnaires and scores have been proposed for OSA prediction with high sensitivity and low specificity. The present study is intended to develop models with various machine learning techniques to predict the severity of OSA by incorporating features from multiple questionnaires. Methods Subjects who underwent full-night polysomnography in Torr sleep center, Texas and completed 5 OSA screening questionnaires/scores were included. OSA was diagnosed by using Apnea-Hypopnea Index ≥ 5. We trained five different machine learning models including Deep Neural Networks with the scaled principal component analysis (DNN-PCA), Random Forest (RF), Adaptive Boosting classifier (ABC), and K-Nearest Neighbors classifier (KNC) and Support Vector Machine Classifier (SVMC). Training:Testing subject ratio of 65:35 was used. All features including demographic data, body measurement, snoring and sleepiness history were obtained from 5 OSA screening questionnaires/scores (STOP-BANG questionnaires, Berlin questionnaires, NoSAS score, NAMES score and No-Apnea score). Performance parametrics were used to compare between machine learning models. Results Of 180 subjects, 51.5 % of subjects were male with mean (SD) age of 53.6 (15.1). One hundred and nineteen subjects were diagnosed with OSA. Area Under the Receiver Operating Characteristic Curve (AUROC) of DNN-PCA, RF, ABC, KNC, SVMC, STOP-BANG questionnaire, Berlin questionnaire, NoSAS score, NAMES score, and No-Apnea score were 0.85, 0.68, 0.52, 0.74, 0.75, 0.61, 0.63, 0,61, 0.58 and 0,58 respectively. DNN-PCA showed the highest AUROC with sensitivity of 0.79, specificity of 0.67, positive-predictivity of 0.93, F1 score of 0.86, and accuracy of 0.77. Conclusion Our result showed that DNN-PCA outperforms OSA screening questionnaires, scores and other machine learning models. Support (if any):


Sign in / Sign up

Export Citation Format

Share Document