Session-Based Webshell Detection Using Machine Learning in Web Logs

Attackers upload webshell into a web server to achieve the purpose of stealing data, launching a DDoS attack, modifying files with malicious intentions, etc. Once these objects are accomplished, it will bring huge losses to website managers. With the gradual development of encryption and confusion technology, the most common detection approach using taint analysis and feature matching might become less useful. Instead of applying source file codes, POST contents, or all received traffic, this paper demonstrated an intelligent and efficient framework that employs precise sessions derived from the web logs to detect webshell communication. Features were extracted from the raw sequence data in web logs while a statistical method based on time interval was proposed to identify sessions specifically. Besides, the paper leveraged long short-term memory and hidden Markov model to constitute the framework, respectively. Finally, the framework was evaluated with real data. The experiment shows that the LSTM-based model can achieve a higher accuracy rate of 95.97% with a recall rate of 96.15%, which has a much better performance than the HMM-based model. Moreover, the experiment demonstrated the high efficiency of the proposed approach in terms of the quick detection without source code, especially when it only considers detecting for a period of time, as it takes 98.5% less time than the cited related approach to get the result. As long as the webshell behavior is detected, we can pinpoint the anomaly session and utilize the statistical method to find the webshell file accurately.

Download Full-text

On-Shelf Utility Mining of Sequence Data

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3457570 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-31

Author(s):

Chunkai Zhang ◽

Zilin Du ◽

Yuting Yang ◽

Wensheng Gan ◽

Philip S. Yu

Keyword(s):

High Efficiency ◽

Sequence Data ◽

Real Life ◽

Search Space ◽

Upper Bounds ◽

Utility Mining ◽

Limited Memory ◽

Time Periods ◽

High Utility ◽

Synthetic Datasets

Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this article, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS + , to extract on-shelf high-utility sequential patterns. For further efficiency, we also design several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility ( TPEU ) and time reduced sequence utility ( TRSU ). In addition, two novel data structures are developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS + has wider real-life applications owing to its high efficiency.

Download Full-text

An LSTM-Based Method Considering History and Real-Time Data for Passenger Flow Prediction

Applied Sciences ◽

10.3390/app10113788 ◽

2020 ◽

Vol 10 (11) ◽

pp. 3788 ◽

Cited By ~ 1

Author(s):

Qi Ouyang ◽

Yongbo Lv ◽

Jihui Ma ◽

Jing Li

Keyword(s):

Feature Extraction ◽

Real Time ◽

Short Term Memory ◽

Historical Data ◽

Time Interval ◽

Information Coding ◽

Time Data ◽

Passenger Flow ◽

Flow Prediction ◽

Real Time Data

With the development of big data and deep learning, bus passenger flow prediction considering real-time data becomes possible. Real-time traffic flow prediction helps to grasp real-time passenger flow dynamics, provide early warning for a sudden passenger flow and data support for real-time bus plan changes, and improve the stability of urban transportation systems. To solve the problem of passenger flow prediction considering real-time data, this paper proposes a novel passenger flow prediction network model based on long short-term memory (LSTM) networks. The model includes four parts: feature extraction based on Xgboost model, information coding based on historical data, information coding based on real-time data, and decoding based on a multi-layer neural network. In the feature extraction part, the data dimension is increased by fusing bus data and points of interest to improve the number of parameters and model accuracy. In the historical information coding part, we use the date as the index in the LSTM structure to encode historical data and provide relevant information for prediction; in the real-time data coding part, the daily half-hour time interval is used as the index to encode real-time data and provide real-time prediction information; in the decoding part, the passenger flow data for the next two 30 min interval outputs by decoding all the information. To our best knowledge, it is the first time to real-time information has been taken into consideration in passenger flow prediction based on LSTM. The proposed model can achieve better accuracy compared to the LSTM and other baseline methods.

Download Full-text

Frequent Closed Partial Orders Mining in Sequences

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.846-847.1304 ◽

2013 ◽

Vol 846-847 ◽

pp. 1304-1307

Author(s):

Ye Wang ◽

Yan Jia ◽

Lu Min Zhang

Keyword(s):

Sequence Data ◽

Real Data ◽

Partial Orders ◽

Hard Problem ◽

Important Data ◽

Data Set ◽

Pruning Algorithm ◽

Equal Chance ◽

Np Hard Problem ◽

General Sequences

Mining partial orders from sequence data is an important data mining task with broad applications. As partial orders mining is a NP-hard problem, many efficient pruning algorithm have been proposed. In this paper, we improve a classical algorithm of discovering frequent closed partial orders from string. For general sequences, we consider items appearing together having equal chance to calculate the detecting matrix used for pruning. Experimental evaluations from a real data set show that our algorithm can effectively mine FCPO from sequences.

Download Full-text

Modeling the Process of Event Sequence Data Generated for Working Condition Diagnosis

Mathematical Problems in Engineering ◽

10.1155/2015/693450 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13

Author(s):

Jianwei Ding ◽

Yingbo Liu ◽

Li Zhang ◽

Jianmin Wang

Keyword(s):

Working Condition ◽

Sequence Data ◽

A Priori ◽

Real Data ◽

Data Sets ◽

Main Task ◽

Event Sequence ◽

Telemetry Data ◽

Condition Monitoring Systems ◽

Condition Diagnosis

Condition monitoring systems are widely used to monitor the working condition of equipment, generating a vast amount and variety of telemetry data in the process. The main task of surveillance focuses on analyzing these routinely collected telemetry data to help analyze the working condition in the equipment. However, with the rapid increase in the volume of telemetry data, it is a nontrivial task to analyze all the telemetry data to understand the working condition of the equipment without any a priori knowledge. In this paper, we proposed a probabilistic generative model called working condition model (WCM), which is capable of simulating the process of event sequence data generated and depicting the working condition of equipment at runtime. With the help of WCM, we are able to analyze how the event sequence data behave in different working modes and meanwhile to detect the working mode of an event sequence (working condition diagnosis). Furthermore, we have applied WCM to illustrative applications like automated detection of an anomalous event sequence for the runtime of equipment. Our experimental results on the real data sets demonstrate the effectiveness of the model.

Download Full-text

An Intelligent Approach to Detect Fake News Using Artificial Intelligence Technique

International Journal of Distributed Artificial Intelligence ◽

10.4018/ijdai.2021070101 ◽

2021 ◽

Vol 13 (2) ◽

pp. 1-12

Author(s):

Sumit Das ◽

Manas Kumar Sanyal ◽

Sarbajyoti Mallik

Keyword(s):

Artificial Intelligence ◽

Web Application ◽

Short Term Memory ◽

Real Data ◽

Fake News ◽

Artificial Intelligence Technique ◽

News Reports ◽

Proposed Model ◽

Long Short Term Memory ◽

Intelligent Approach

There is a lot of fake news roaming around various mediums, which misleads people. It is a big issue in this advanced intelligent era, and there is a need to find some solution to this kind of situation. This article proposes an approach that analyzes fake and real news. This analysis is focused on sentiment, significance, and novelty, which are a few characteristics of this news. The ability to manipulate daily information mathematically and statistically is allowed by expressing news reports as numbers and metadata. The objective of this article is to analyze and filter out the fake news that makes trouble. The proposed model is amalgamated with the web application; users can get real data and fake data by using this application. The authors have used the AI (artificial intelligence) algorithms, specifically logistic regression and LSTM (long short-term memory), so that the application works well. The results of the proposed model are compared with existing models.

Download Full-text

Machine Learning and Prediction-Based Resource Management in IoT Considering Qos

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1705.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 687-694

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Short Term Memory ◽

Real Data ◽

Computing Methods ◽

Energy Utilization ◽

Real Field ◽

Proposed Model ◽

Improved Performance ◽

Video Sensors

Internet of Things (IoT) is one of the fast-growing technology paradigms used in every sectors, where in the Quality of Service (QoS) is a critical component in such systems and usage perspective with respect to ProSumers (producer and consumers). Most of the recent research works on QoS in IoT have used Machine Learning (ML) techniques as one of the computing methods for improved performance and solutions. The adoption of Machine Learning and its methodologies have become a common trend and need in every technologies and domain areas, such as open source frameworks, task specific algorithms and using AI and ML techniques. In this work we propose an ML based prediction model for resource optimization in the IoT environment for QoS provisioning. The proposed methodology is implemented by using a multi-layer neural network (MNN) for Long Short Term Memory (LSTM) learning in layered IoT environment. Here the model considers the resources like bandwidth and energy as QoS parameters and provides the required QoS by efficient utilization of the resources in the IoT environment. The performance of the proposed model is evaluated in a real field implementation by considering a civil construction project, where in the real data is collected by using video sensors and mobile devices as edge nodes. Performance of the prediction model is observed that there is an improved bandwidth and energy utilization in turn providing the required QoS in the IoT environment.

Download Full-text

Two-dimensional stochastic modeling for predicting bankruptcy for manufacturing companies

10.47302/jsr.2020540202 ◽

2021 ◽

Vol 54 (2) ◽

pp. 123-129

Author(s):

James C. Fu ◽

Winnie H. W. Fu

Keyword(s):

Brownian Motion ◽

Real Data ◽

Bankruptcy Prediction ◽

Boundary Crossing ◽

Time Interval ◽

Two Dimensional ◽

Manufacturing Companies ◽

Boundary Crossing Probability ◽

Data Set ◽

Crossing Probability

Increasing accuracy of the model prediction on business bankruptcy helps reduce substantial losses for owners, creditors, investors and workers, and, further, minimize an economic and social problem frequently. In this study, we propose a stochastic model of financial working capital and cashflow as a two-dimensional Brownian motion X(t) = (X1(t),X2(t)) on the business bankruptcy prediction. The probability of bankruptcy occurring in a time interval [0,T] is defined by the boundary crossing probability of the two-dimensional Brownian motion entering a predetermined threshold domain. Mathematically, we extend the result in Fu and Wu (2016) on the boundary crossing probability of a high dimensional Brownian motion to an unbounded convex hull. The proposed model is applied to a real data set of companies in US and the numerical results show the proposed method performs well.

Download Full-text

Consecrate Recurrent Neural Network Classifier for Autism Classification

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a9550.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2033-2041

Keyword(s):

Neural Network ◽

Autism Spectrum Disorder ◽

Recurrent Neural Network ◽

Short Term Memory ◽

Sequence Data ◽

Large Data ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Neural Network Classifier ◽

Gene Sequences

Most recent discoveries in Autism Spectrum Disorder (ASD) detection and classification studies reveal that there is a substantial relationship between Autism disorders and gene sequences. This work is indented to classify the autism spectrum disorder groups and sub-groups based on the gene sequences. The gene sequences are large data and perplexed for handling with conventional data mining or classification procedures. The Consecrate Recurrent Neural Network Classifier for Autism Classification (CRNNC-AC) work is introduced in this work to classify autism disorders using gene sequence data. A dedicated Elman [1] type Recurrent Neural Network (RNN) is introduced along with a legacy Long Short-Term Memory (LSTM) [2] in this classifier. The LSTM model is contrived to achieve memory optimization to eliminate memory overflows without affecting the classification accuracy. The classification quality metrics [3] such as Accuracy, Sensitivity, Specificity and F1-Score are concerned for optimization. The processing time of the proposed method is also measured to evaluate the pertinency.

Download Full-text

Importance of experimental information (metadata) for archived sequence data: case of specific gene bias due to lag time between sample harvest and RNA protection in RNA sequencing

PeerJ ◽

10.7717/peerj.11875 ◽

2021 ◽

Vol 9 ◽

pp. e11875

Author(s):

Tomoko Matsuda

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Time Course ◽

Sequence Data ◽

Specific Gene ◽

Time Interval ◽

Short Time Interval ◽

Rna Seq ◽

Lysis Buffer ◽

Rna Protection

Large volumes of high-throughput sequencing data have been submitted to the Sequencing Read Archive (SRA). The lack of experimental metadata associated with the data makes reuse and understanding data quality very difficult. In the case of RNA sequencing (RNA-Seq), which reveals the presence and quantity of RNA in a biological sample at any moment, it is necessary to consider that gene expression responds over a short time interval (several seconds to a few minutes) in many organisms. Therefore, to isolate RNA that accurately reflects the transcriptome at the point of harvest, raw biological samples should be processed by freezing in liquid nitrogen, immersing in RNA stabilization reagent or lysing and homogenizing in RNA lysis buffer containing guanidine thiocyanate as soon as possible. As the number of samples handled simultaneously increases, the time until the RNA is protected can increase. Here, to evaluate the effect of different lag times in RNA protection on RNA-Seq data, we harvested CHO-S cells after 3, 5, 6, and 7 days of cultivation, added RNA lysis buffer in a time course of 15, 30, 45, and 60 min after harvest, and conducted RNA-Seq. These RNA samples showed high RNA integrity number (RIN) values indicating non-degraded RNA, and sequence data from libraries prepared with these RNA samples was of high quality according to FastQC. We observed that, at the same cultivation day, global trends of gene expression were similar across the time course of addition of RNA lysis buffer; however, the expression of some genes was significantly different between the time-course samples of the same cultivation day; most of these differentially expressed genes were related to apoptosis. We conclude that the time lag between sample harvest and RNA protection influences gene expression of specific genes. It is, therefore, necessary to know not only RIN values of RNA and the quality of the sequence data but also how the experiment was performed when acquiring RNA-Seq data from the database.

Download Full-text

Self-Assisted First-Fix Method for A-BDS Receivers with Medium- and Long-Term Ephemeris Extension

Mathematical Problems in Engineering ◽

10.1155/2018/5325034 ◽

2018 ◽

Vol 2018 ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Zilong Shen ◽

Jing Peng ◽

Wenxiang Liu ◽

Feixue Wang ◽

Shibing Zhu ◽

...

Keyword(s):

Dynamic Models ◽

Rapid Development ◽

Real Data ◽

Satellite System ◽

Time Interval ◽

Signal Acquisition ◽

Positioning Accuracy ◽

Broadcast Ephemeris ◽

Intelligent Logistics

As a sensor for standalone position and velocity determination, the BeiDou Navigation Satellite System (BDS) receiver is becoming an important part of the intelligent logistics systems under rapid development in China. The applications in the mass market urgently require the BDS receivers to improve the performance of such functions, that is, shorter Time to First Fix (TTFF) and faster navigation signal acquisition speed with Ephemeris Extension (EE) in standalone mode. As a practical way to improve such functions of the Assisted BDS (A-BDS) receivers without the need for specialized hardware support, a Self-Assisted First-Fix (SAFF) method with medium- and long-term EE is proposed in this paper. In this SAFF method, the dynamic Medium- and Long-Term Orbit Prediction (MLTOP) method, which uses the historical broadcast ephemeris data with the optimal configuration of the dynamic models and orbit fitting time interval, is utilized to generate the extended ephemeris. To demonstrate the performance of the MLTOP method used in the SAFF method, a suit of tests, which were based on the real data of broadcast ephemeris and precise ephemeris, were carried out. In terms of the positioning accuracy, the overall performance of the SAFF method is illustrated. Based on the characteristics of the medium- and long-term EE, the simulation tests for the SAFF method were conducted. Results show that, for the SAFF method with medium- and long-term EE of the BeiDou MEO/IGSO satellites, the horizontal positioning accuracy is about 12 meters, and the overall positioning accuracy is about 25 meters. The results also indicate that, for the BeiDou satellites with different orbit types, the optimal configurations of the MLTOP method are different.

Download Full-text