Botnet Attack Detection Using A Hybrid Supervised Fast-Flux Killer System

A Fast Flux Service Network (FFSN) domain name system method is a technique used on botnet that bot herders used to support malicious botnet actions to rapidly change the domain name IP addresses and to increase the life of malicious servers. While several methods for the detection of FFSN domains are suggested, they are still suffering from relatively low accuracy with the zero-day domain in particular. Throughout the current research, a system that’s deemed new is proposed. The latter system is called (the Fast Flux Killer System) and is abbreviated as (FFKS)). It allows one to have the FF-Domains “zero-day”, via a deployment built on (ADeSNN). It is a hybrid, which consists of two stages. The online phase according to the learning outcomes from the offline phase works on detecting the zero-day domains while the offline phase helps in enhancing the classification performance of the system in the online phase. This system will be compared to a previously published work that was based on a supervised detection method using the same ADeSNN algorithm to have the FFSNs domains detected, also to show better performance in detecting malicious domains. A public data set for the impacts of the hybrid ADeSNN algorithm is employed in the experiment. When hybrid ADeSNN was used over the supervised one, the experiments showed better accuracy. The detection of zero-day fast-flux domains is highly accurate (99.54%) in a mode considered as an online one.

Download Full-text

OPTIMIZATION OF ATTRIBUTE SELECTION MODEL USING BIO-INSPIRED ALGORITHMS

Journal of Information and Communication Technology ◽

10.32890/jict2019.18.1.8280 ◽

2018 ◽

Author(s):

Mohammad Aizat Basir ◽

Yuhanis Yusof ◽

Mohamed Saifullah Hussin

Keyword(s):

Feature Selection ◽

Search Algorithm ◽

Classification Performance ◽

Selection Model ◽

Attribute Selection ◽

Data Set ◽

Good Classification ◽

Two Stages ◽

Mining Model ◽

Selection Algorithms

Attribute selection which is also known as feature selection is an essential process that is relevant to predictive analysis. To date, various feature selection algorithms have been introduced, nevertheless they all work independently. Hence, reducing the consistency of the accuracy rate. The aim of this paper is to investigate the use of bio-inspired search algorithms in producing optimal attribute set. This is achieved in two stages; 1) create attribute selection models by combining search method and feature selection algorithms, and 2) determine an optimized attribute set by employing bio-inspired algorithms. Classification performance of the produced attribute set is analyzed based on accuracy and number of selected attributes. Experimental results conducted on six (6) public real datasets reveal that the feature selection model with the implementation of bio-inspired search algorithm consistently performs good classification (i.e higher accuracy with fewer numbers of attributes) on the selected data set. Such a finding indicates that bio-inspired algorithms can contribute in identifying the few most important features to be used in data mining model construction.

Download Full-text

AVA: A Financial Service Chatbot Based on Deep Bidirectional Transformers

Frontiers in Applied Mathematics and Statistics ◽

10.3389/fams.2021.604842 ◽

2021 ◽

Vol 7 ◽

Author(s):

Shi Yu ◽

Yuxin Chen ◽

Hussain Zaidi

Keyword(s):

Customer Service ◽

Language Model ◽

Classification Performance ◽

Financial Service ◽

Mixed Integer ◽

Uncertainty Measure ◽

Data Set ◽

Financial Investment ◽

Public Data ◽

Human Operators

We develop a chatbot using deep bidirectional transformer (BERT) models to handle client questions in financial investment customer service. The bot can recognize 381 intents, decides when to say I don’t know, and escalate escalation/uncertain questions to human operators. Our main novel contribution is the discussion about the uncertainty measure for BERT, where three different approaches are systematically compared with real problems. We investigated two uncertainty metrics, information entropy and variance of dropout sampling, in BERT, followed by mixed-integer programming to optimize decision thresholds. Another novel contribution is the usage of BERT as a language model in automatic spelling correction. Inputs with accidental spelling errors can significantly decrease intent classification performance. The proposed approach combines probabilities from masked language model and word edit distances to find the best corrections for misspelled words. The chatbot and the entire conversational AI system are developed using open-source tools and deployed within our company’s intranet. The proposed approach can be useful for industries seeking similar in-house solutions in their specific business domains. We share all our code and a sample chatbot built on a public data set on GitHub.

Download Full-text

FUZZY kNNMODEL APPLIED TO PREDICTIVE TOXICOLOGY DATA MINING

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026805001635 ◽

2005 ◽

Vol 05 (03) ◽

pp. 321-333 ◽

Cited By ~ 2

Author(s):

GONGDE GUO ◽

DANIEL NEAGU

Keyword(s):

Chemical Compounds ◽

Classification Performance ◽

Data Sets ◽

Toxicity Prediction ◽

Predictive Toxicology ◽

Data Set ◽

Fuzzy Partitioning ◽

Public Data ◽

Real World Applications ◽

Fuzzy C Means Clustering

A robust method, fuzzy kNNModel, for toxicity prediction of chemical compounds is proposed. The method is based on a supervised clustering method, called kNNModel, which employs fuzzy partitioning instead of crisp partitioning to group clusters. The merits of fuzzy kNNModel are two-fold: (1) it overcomes the problems of choosing the parameter ε — allowed error rate in a cluster and the parameter N — minimal number of instances covered by a cluster, for each data set; (2) it better captures the characteristics of boundary data by assigning them with different degrees of membership between 0 and 1 to different clusters. The experimental results of fuzzy kNNModel conducted on thirteen public data sets from UCI machine learning repository and seven toxicity data sets from real-world applications, are compared with the results of fuzzy c-means clustering, k-means clustering, kNN, fuzzy kNN, and kNNModel in terms of classification performance. This application shows that fuzzy kNNModel is a promising method for the toxicity prediction of chemical compounds.

Download Full-text

Virtual fit evaluation of pants using the Adaptive Network Fuzzy Inference System

Textile Research Journal ◽

10.1177/00405175211020515 ◽

2021 ◽

pp. 004051752110205

Author(s):

Xueqing Zhao ◽

Ke Fan ◽

Xin Shi ◽

Kaixuan Liu

Keyword(s):

Fuzzy Inference System ◽

Fuzzy Inference ◽

Test Accuracy ◽

Adaptive Network ◽

Data Set ◽

Inference System ◽

Second Stage ◽

Simulated Environment ◽

Two Stages ◽

Fold Cross Validation

Virtual reality is a technology that allows users to completely interact with a computer-simulated environment, and put on new clothes to check the effect without taking off their clothes. In this paper, a virtual fit evaluation of pants using the Adaptive Network Fuzzy Inference System (ANFIS), VFE-ANFIS for short, is proposed. There are two stages of the VFE-ANFIS: training and evaluation. In the first stage, we trained some key pressure parameters by using the VFE-ANFIS; these key pressure parameters were collected from real try-on and virtual try-on of pants by users. In the second stage, we evaluated the fit by using the trained VFE-ANFIS, in which some key pressure parameters of pants from a new user were determined and we output the evaluation results, fit or unfit. In addition, considering the small number of input samples, we used the 10-fold cross-validation method to divide the data set into a training set and a testing set; the test accuracy of the VFE-ANFIS was 94.69% ± 2.4%, and the experimental results show that our proposed VFE-ANFIS could be applied to the virtual fit evaluation of pants.

Download Full-text

A Benchmark and Evaluation of Non-Rigid Structure from Motion

International Journal of Computer Vision ◽

10.1007/s11263-020-01406-y ◽

2020 ◽

Author(s):

Sebastian Hoppe Nesgaard Jensen ◽

Mads Emil Brix Doest ◽

Henrik Aanæs ◽

Alessio Del Bue

Keyword(s):

Computer Vision ◽

Structure From Motion ◽

State Of The Art ◽

The State ◽

Quality Data ◽

Data Set ◽

Rigid Structure ◽

Public Data ◽

3D Information ◽

Further Development

AbstractNon-rigid structure from motion (nrsfm), is a long standing and central problem in computer vision and its solution is necessary for obtaining 3D information from multiple images when the scene is dynamic. A main issue regarding the further development of this important computer vision topic, is the lack of high quality data sets. We here address this issue by presenting a data set created for this purpose, which is made publicly available, and considerably larger than the previous state of the art. To validate the applicability of this data set, and provide an investigation into the state of the art of nrsfm, including potential directions forward, we here present a benchmark and a scrupulous evaluation using this data set. This benchmark evaluates 18 different methods with available code that reasonably spans the state of the art in sparse nrsfm. This new public data set and evaluation protocol will provide benchmark tools for further development in this challenging field.

Download Full-text

Exercise Performance in Adolescents with Fontan Physiology (From the Pediatric Heart Network Fontan Public Data Set)

The American Journal of Cardiology ◽

10.1016/j.amjcard.2021.03.018 ◽

2021 ◽

Author(s):

Michael D. Seckeler ◽

Brent J. Barber ◽

Jamie N. Colombo ◽

Alyssa M. Bernardi ◽

Andrew W. Hoyer ◽

...

Keyword(s):

Exercise Performance ◽

Data Set ◽

Public Data ◽

Fontan Physiology

Download Full-text

Quality control in scRNA-Seq can discriminate pacemaker cells: the mtRNA bias

Cellular and Molecular Life Sciences ◽

10.1007/s00018-021-03916-5 ◽

2021 ◽

Author(s):

Anne-Marie Galow ◽

Sophie Kussauer ◽

Markus Wolfien ◽

Ronald M. Brunner ◽

Tom Goldammer ◽

...

Keyword(s):

Quality Control ◽

Cardiac Tissue ◽

High Energy ◽

Tissue Type ◽

Pacemaker Cells ◽

Data Set ◽

Public Data ◽

Standard Quality ◽

Mitochondrial Transcripts

AbstractSingle-cell RNA-sequencing (scRNA-seq) provides high-resolution insights into complex tissues. Cardiac tissue, however, poses a major challenge due to the delicate isolation process and the large size of mature cardiomyocytes. Regardless of the experimental technique, captured cells are often impaired and some capture sites may contain multiple or no cells at all. All this refers to “low quality” potentially leading to data misinterpretation. Common standard quality control parameters involve the number of detected genes, transcripts per cell, and the fraction of transcripts from mitochondrial genes. While cutoffs for transcripts and genes per cell are usually user-defined for each experiment or individually calculated, a fixed threshold of 5% mitochondrial transcripts is standard and often set as default in scRNA-seq software. However, this parameter is highly dependent on the tissue type. In the heart, mitochondrial transcripts comprise almost 30% of total mRNA due to high energy demands. Here, we demonstrate that a 5%-threshold not only causes an unacceptable exclusion of cardiomyocytes but also introduces a bias that particularly discriminates pacemaker cells. This effect is apparent for our in vitro generated induced-sinoatrial-bodies (iSABs; highly enriched physiologically functional pacemaker cells), and also evident in a public data set of cells isolated from embryonal murine sinoatrial node tissue (Goodyer William et al. in Circ Res 125:379–397, 2019). Taken together, we recommend omitting this filtering parameter for scRNA-seq in cardiovascular applications whenever possible.

Download Full-text

Fast and accurate detection of surface defect based on improved YOLOv4

Assembly Automation ◽

10.1108/aa-04-2021-0044 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Jiawei Lian ◽

Junhong He ◽

Yun Niu ◽

Tianze Wang

Keyword(s):

Feature Extraction ◽

Real Time ◽

Surface Defect ◽

Steel Ingot ◽

Industrial Applications ◽

Data Sets ◽

Data Set ◽

Processing Technologies ◽

Content Type ◽

Public Data

Purpose The current popular image processing technologies based on convolutional neural network have the characteristics of large computation, high storage cost and low accuracy for tiny defect detection, which is contrary to the high real-time and accuracy, limited computing resources and storage required by industrial applications. Therefore, an improved YOLOv4 named as YOLOv4-Defect is proposed aim to solve the above problems. Design/methodology/approach On the one hand, this study performs multi-dimensional compression processing on the feature extraction network of YOLOv4 to simplify the model and improve the feature extraction ability of the model through knowledge distillation. On the other hand, a prediction scale with more detailed receptive field is added to optimize the model structure, which can improve the detection performance for tiny defects. Findings The effectiveness of the method is verified by public data sets NEU-CLS and DAGM 2007, and the steel ingot data set collected in the actual industrial field. The experimental results demonstrated that the proposed YOLOv4-Defect method can greatly improve the recognition efficiency and accuracy and reduce the size and computation consumption of the model. Originality/value This paper proposed an improved YOLOv4 named as YOLOv4-Defect for the detection of surface defect, which is conducive to application in various industrial scenarios with limited storage and computing resources, and meets the requirements of high real-time and precision.

Download Full-text

A Novel ECG Automatic Detection Using LongShort-Term Memory Network and Internet of Things Technology

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2021.3684 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1592-1598

Author(s):

Xufei Liu

Keyword(s):

Machine Learning ◽

Cardiovascular Diseases ◽

Internet Of Things ◽

Network Structure ◽

Recognition Rate ◽

Classification Model ◽

Data Set ◽

Public Data ◽

Lstm Network ◽

Internet Of Things Technology

The early detection of cardiovascular diseases based on electrocardiogram (ECG) is very important for the timely treatment of cardiovascular patients, which increases the survival rate of patients. ECG is a visual representation that describes changes in cardiac bioelectricity and is the basis for detecting heart health. With the rise of edge machine learning and Internet of Things (IoT) technologies, small machine learning models have received attention. This study proposes an ECG automatic classification method based on Internet of Things technology and LSTM network to achieve early monitoring and early prevention of cardiovascular diseases. Specifically, this paper first proposes a single-layer bidirectional LSTM network structure. Make full use of the timing-dependent features of the sampling points before and after to automatically extract features. The network structure is more lightweight and the calculation complexity is lower. In order to verify the effectiveness of the proposed classification model, the relevant comparison algorithm is used to verify on the MIT-BIH public data set. Secondly, the model is embedded in a wearable device to automatically classify the collected ECG. Finally, when an abnormality is detected, the user is alerted by an alarm. The experimental results show that the proposed model has a simple structure and a high classification and recognition rate, which can meet the needs of wearable devices for monitoring ECG of patients.

Download Full-text

SHI7 Is a Self-Learning Pipeline for Multipurpose Short-Read DNA Quality Control

mSystems ◽

10.1128/msystems.00202-17 ◽

2018 ◽

Vol 3 (3) ◽

Cited By ~ 15

Author(s):

Gabriel A. Al-Ghalith ◽

Benjamin Hillmann ◽

Kaiwei Ang ◽

Robin Shields-Cutler ◽

Dan Knights

Keyword(s):

Quality Control ◽

Dna Sequences ◽

Sequence Data ◽

Background Knowledge ◽

Sequencing Technology ◽

Data Set ◽

Short Read ◽

Dna Quality ◽

Public Data ◽

User Friendly

ABSTRACT Next-generation sequencing technology is of great importance for many biological disciplines; however, due to technical and biological limitations, the short DNA sequences produced by modern sequencers require numerous quality control (QC) measures to reduce errors, remove technical contaminants, or merge paired-end reads together into longer or higher-quality contigs. Many tools for each step exist, but choosing the appropriate methods and usage parameters can be challenging because the parameterization of each step depends on the particularities of the sequencing technology used, the type of samples being analyzed, and the stochasticity of the instrumentation and sample preparation. Furthermore, end users may not know all of the relevant information about how their data were generated, such as the expected overlap for paired-end sequences or type of adaptors used to make informed choices. This increasing complexity and nuance demand a pipeline that combines existing steps together in a user-friendly way and, when possible, learns reasonable quality parameters from the data automatically. We propose a user-friendly quality control pipeline called SHI7 (canonically pronounced “shizen”), which aims to simplify quality control of short-read data for the end user by predicting presence and/or type of common sequencing adaptors, what quality scores to trim, whether the data set is shotgun or amplicon sequencing, whether reads are paired end or single end, and whether pairs are stitchable, including the expected amount of pair overlap. We hope that SHI7 will make it easier for all researchers, expert and novice alike, to follow reasonable practices for short-read data quality control. IMPORTANCE Quality control of high-throughput DNA sequencing data is an important but sometimes laborious task requiring background knowledge of the sequencing protocol used (such as adaptor type, sequencing technology, insert size/stitchability, paired-endedness, etc.). Quality control protocols typically require applying this background knowledge to selecting and executing numerous quality control steps with the appropriate parameters, which is especially difficult when working with public data or data from collaborators who use different protocols. We have created a streamlined quality control pipeline intended to substantially simplify the process of DNA quality control from raw machine output files to actionable sequence data. In contrast to other methods, our proposed pipeline is easy to install and use and attempts to learn the necessary parameters from the data automatically with a single command.

Download Full-text