Nominate of significant features for unknown internet traffic applications filtering based on a neural network algorithm

The evolution of the internet into a large, complex service-based network has posed tremendous challenges for network monitoring and control in terms of how to collect massive volumes of data, in addition to the accurate classification of new emerging applications, such as peer-to-peer networks, streaming content and online games. In this work, machine learning algorithms are used for the classification of traffic into their corresponding applications. Furthermore, this research uses our customized training data set collected from the three institutions' campuses. The effect on the size of the training data set has been considered before examining the accuracy of various classification algorithms and selecting the best from a large amount of data traffic in the network, which has led to delays in performance; therefore, to solve this problem we suggested a distinct approach using multiple neural networks with the feature selection in order to predict and identify known and unknown applications. By applying the proposed method, we get excellent accuracy in the classification of data traffic in the network of up to 99.11%, which leads to improved data traffic in the network and avoids delays.

Download Full-text

DETECTION AND CLASSIFICATION OF SYMBOLS IN PRINCIPLE SKETCHES USING DEEP LEARNING

Proceedings of the Design Society ◽

10.1017/pds.2021.118 ◽

2021 ◽

Vol 1 ◽

pp. 1183-1192

Author(s):

Sebastian Bickel ◽

Benjamin Schleich ◽

Sandro Wartzack

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Machine Learning Algorithms ◽

Training Data ◽

Product Development Process ◽

Data Set ◽

Deep Learning Network ◽

The Cost ◽

Early Phases

AbstractData-driven methods from the field of Artificial Intelligence or Machine Learning are increasingly applied in mechanical engineering. This refers to the development of digital engineering in recent years, which aims to bring these methods into practice in order to realize cost and time savings. However, a necessary step towards the implementation of such methods is the utilization of existing data. This problem is essential because the mere availability of data does not automatically imply data usability. Therefore, this paper presents a method to automatically recognize symbols from principle sketches, which allows the generation of training data for machine learning algorithms. In this approach, the symbols are created randomly and their illustration varies with each generation. . A deep learning network from the field of computer vision is used to test the generated data set and thus to recognize symbols on principle sketches. This type of drawing is especially interesting because the cost-saving potential is very high due to the application in the early phases of the product development process.

Download Full-text

Study on Consistency Analysis in Text Categorization

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.539.181 ◽

2014 ◽

Vol 539 ◽

pp. 181-184

Author(s):

Wan Li Zuo ◽

Zhi Yan Wang ◽

Ning Ma ◽

Hong Liang

Keyword(s):

Text Categorization ◽

Training Data ◽

Experimental Result ◽

Final Decision ◽

Consistency Analysis ◽

Training Set ◽

Weak Classifier ◽

Data Set ◽

Basic Premise

Accurate classification of text is a basic premise of extracting various types of information on the Web efficiently and utilizing the network resources properly. In this paper, a brand new text classification method was proposed. Consistency analysis method is a type of iterative algorithm, which mainly trains different classifiers (weak classifier) by aiming at the same training set, and then these classifiers will be gathered for testing the consistency degrees of various classification methods for the same text, thus to manifest the knowledge of each type of classifier. It main determines the weight of each sample according to the fact is the classification of each sample is accurate in each training set, as well as the accuracy of the last overall classification, and then sends the new data set whose weight has been modified to the subordinate classifier for training. In the end, the classifier gained in the training will be integrated as the final decision classifier. The classifier with consistency analysis can eliminate some unnecessary training data characteristics and place the key words on key training data. According to the experimental result, the average accuracy of this method is 91.0%, while the average recall rate is 88.1%.

Download Full-text

A Comparison of Machine Learning Algorithms for the Segmentation and Classification of Snow Micro Penetrometer Profiles on Arctic Sea Ice

10.5194/egusphere-egu21-15637 ◽

2021 ◽

Author(s):

Julia Kaltenborn ◽

Viviane Clay ◽

Amy R. Macfarlane ◽

Joshua Michael Lloyd King ◽

Martin Schneebeli

Keyword(s):

Machine Learning ◽

Sea Ice ◽

Arctic Sea Ice ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Snow Layer ◽

Arctic Sea ◽

Execution Speed

Snow-layer classification is an essential diagnostic task for a wide variety of cryospheric science and climate research applications. Traditionally, these measurements are made in snow pits, requiring trained operators and a substantial time commitment. The SnowMicroPen (SMP), a portable high-resolution snow penetrometer, has been demonstrated as a capable tool for rapid snow grain classification and layer type segmentation through statistical inversion of its mechanical signal. The manual classification of the SMP profiles requires time and training and becomes infeasible for large datasets.Here, we introduce a novel set of SMP measurements collected during the MOSAiC expedition and apply Machine Learning (ML) algorithms to automatically classify and segment SMP profiles of snow on Arctic sea ice. To this end, different supervised and unsupervised ML methods, including Random Forests, Support Vector Machines, Artificial Neural Networks, and k-means Clustering, are compared. A subsequent segmentation of the classified data results in distinct layers and snow grain markers for the SMP profiles. The models are trained with the dataset by King et al. (2020) and the MOSAiC SMP dataset. The MOSAiC dataset is a unique and extensive dataset characterizing seasonal and spatial variation of snow on the central Arctic sea-ice.We will test and compare the different algorithms and evaluate the algorithms&#8217; effectiveness based on the need for initial dataset labeling, execution speed, and ease of implementation. In particular, we will compare supervised to unsupervised methods, which are distinguished by their need for labeled training data.The implementation of different ML algorithms for SMP profile classification could provide a fast and automatic grain type classification and snow layer segmentation. Based on the gained knowledge from the algorithms&#8217; comparison, a tool can be built to provide scientists from different fields with an immediate SMP profile classification and segmentation.&#160;&#160;King, J., Howell, S., Brady, M., Toose, P., Derksen, C., Haas, C., & Beckers, J. (2020). Local-scale variability of snow density on Arctic sea ice. The Cryosphere, 14(12), 4323-4339, https://doi.org/10.5194/tc-14-4323-2020.

Download Full-text

Can Short and Partial Observations Reduce Model Error and Facilitate Machine Learning Prediction?

Entropy ◽

10.3390/e22101075 ◽

2020 ◽

Vol 22 (10) ◽

pp. 1075

Author(s):

Nan Chen

Keyword(s):

Machine Learning ◽

Model Error ◽

Machine Learning Algorithms ◽

Training Data ◽

Conditional Sampling ◽

Data Set ◽

Partial Observations ◽

Sampling Algorithm ◽

Highly Nonlinear ◽

Non Gaussian

Predicting complex nonlinear turbulent dynamical systems is an important and practical topic. However, due to the lack of a complete understanding of nature, the ubiquitous model error may greatly affect the prediction performance. Machine learning algorithms can overcome the model error, but they are often impeded by inadequate and partial observations in predicting nature. In this article, an efficient and dynamically consistent conditional sampling algorithm is developed, which incorporates the conditional path-wise temporal dependence into a two-step forward-backward data assimilation procedure to sample multiple distinct nonlinear time series conditioned on short and partial observations using an imperfect model. The resulting sampled trajectories succeed in reducing the model error and greatly enrich the training data set for machine learning forecasts. For a rich class of nonlinear and non-Gaussian systems, the conditional sampling is carried out by solving a simple stochastic differential equation, which is computationally efficient and accurate. The sampling algorithm is applied to create massive training data of multiscale compressible shallow water flows from highly nonlinear and indirect observations. The resulting machine learning prediction significantly outweighs the imperfect model forecast. The sampling algorithm also facilitates the machine learning forecast of a highly non-Gaussian climate phenomenon using extremely short observations.

Download Full-text

Identification of Leukemia Subtypes from Microscopic Images Using Convolutional Neural Network

Diagnostics ◽

10.3390/diagnostics9030104 ◽

2019 ◽

Vol 9 (3) ◽

pp. 104 ◽

Cited By ~ 11

Author(s):

Ahmed ◽

Yigit ◽

Isik ◽

Alpkocak

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Leukemia Data

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.

Download Full-text

EFFICIENT CLASSIFICATION OF SCANNED MEDIA USING SPATIAL STATISTICS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001410008263 ◽

2010 ◽

Vol 24 (06) ◽

pp. 917-946

Author(s):

GOZDE UNAL ◽

GAURAV SHARMA ◽

REINER ESCHBACH

Keyword(s):

Spatial Statistics ◽

High Reliability ◽

Image Data ◽

Training Data ◽

Automated Classification ◽

Data Set ◽

Statistical Measures ◽

Scanned Image ◽

Scanned Images

Photography, lithography, xerography, and inkjet printing are the dominant technologies for color printing. Images produced on these "different media" are often scanned either for the purpose of copying or creating an electronic representation. For an improved color calibration during scanning, a media identification from the scanned image data is desirable. In this paper, we propose an efficient algorithm for automated classification of input media into four major classes corresponding to photographic, lithographic, xerographic and inkjet. Our technique exploits the strong correlation between the type of input media and the spatial statistics of corresponding images, which are observed in the scanned images. We adopt ideas from spatial statistics literature, and design two spatial statistical measures of dispersion and periodicity, which are computed over spatial point patterns generated from blocks of the scanned image, and whose distributions provide the features for making a decision. We utilize extensive training data and determined well separated decision regions to classify the input media. We validate and tested our classification technique results over an independent extensive data set. The results demonstrate that the proposed method is able to distinguish between the different media with high reliability.

Download Full-text

Bees Assorter

International Journal for Modern Trends in Science and Technology - RTT2020 ◽

10.46501/ijmtst061212 ◽

2020 ◽

Vol 6 (12) ◽

pp. 61-65

Author(s):

Himanshu Verma

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Bumble Bee ◽

Wing Size ◽

Qualitative And Quantitative Analysis ◽

Data Set ◽

Qualitative And Quantitative ◽

The Difference

Many attempts were made to classify the bees that is bumble bee or honey bee , there have been such a large amount of researches which were made to seek out the difference between them on the premise of various features like wing size , size of bee , color, life cycle and many more. But altogether the analysis there have been either that specialize in qualitative or quantitative , but to beat this issue , thus researchers came up with an answer which might be both qualitative and quantitative analysis made to classify them. And making use of machine learning algorithm to classify them gives a lift . Now the classification would take less time as these algorithms are pretty fast and accurate . By using machine learning work is made easy . Lots of photographs had to be collected and stored for data set. And by using these machine learning algorithms we would be getting information about the bees which might be employed by researchers in further classification of bees. Manipulation of images had to be done so as on prepare them in such a way that they will be applied to the algorithms and have feature extraction done. As there have been a lot of photographs(data set) which take a lot of space and also the area in which bees were present in these photographs were too small so to accommodate it dimension reduction was done , it might not consider other images like trees , leaves , flowers which were there present in the photograph which we elect as a data set.

Download Full-text

ANALYSIS OF THE LOCATION OF NANNING LARGE-SCALE MALL BASED ON BP NEURAL NETWORK

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-w10-975-2020 ◽

2020 ◽

Vol XLII-3/W10 ◽

pp. 975-979

Author(s):

H. Huang ◽

L. L. Liu

Keyword(s):

Neural Network ◽

Bp Neural Network ◽

Site Selection ◽

Large Scale ◽

Machine Learning Algorithms ◽

Training Data ◽

Location Analysis ◽

Analysis Model ◽

Reference Information ◽

Neural Network Algorithm

Abstract. Site selection is a key first step in the operation of large-scale shopping malls, and most of the existing site selection methods lack practicality and efficiency. Therefore, it is necessary to carry out a scientific modeling of the site selection problem and provide effective reference information for site selection. With the development of machine learning algorithms, the modeling of such problems becomes more and more simple. In this paper, using matlab software as a tool, based on BP neural network algorithm, Nanning urban area is selected as the research object. After analyzing the influencing factors of location problem, the large-scale mall location analysis modeling is carried out. After repeated training and testing of the training data and the test data, the data for testing the usability is input into the model and applied for analysis. It turns out that the large-scale mall location analysis model is usable and can meet the site selection needs of the mall.

Download Full-text

Rotor Unbalance Kind and Severity Identification by Current Signature Analysis with Adaptative Update to Multiclass Machine Learning Algorithms

Studies in Engineering and Technology ◽

10.11114/set.v8i1.5213 ◽

2021 ◽

Vol 8 (1) ◽

pp. 28

Author(s):

S. L. Ávila ◽

H. M. Schaberle ◽

S. Youssef ◽

F. S. Pacheco ◽

C. A. Penz

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Signature Analysis ◽

Data Set ◽

Learning Techniques ◽

Environmental Variations ◽

Current Signature

The health of a rotating electric machine can be evaluated by monitoring electrical and mechanical parameters. As more information is available, it easier can become the diagnosis of the machine operational condition. We built a laboratory test bench to study rotor unbalance issues according to ISO standards. Using the electric stator current harmonic analysis, this paper presents a comparison study among Support-Vector Machines, Decision Tree classifies, and One-vs-One strategy to identify rotor unbalance kind and severity problem – a nonlinear multiclass task. Moreover, we propose a methodology to update the classifier for dealing better with changes produced by environmental variations and natural machinery usage. The adaptative update means to update the training data set with an amount of recent data, saving the entire original historical data. It is relevant for engineering maintenance. Our results show that the current signature analysis is appropriate to identify the type and severity of the rotor unbalance problem. Moreover, we show that machine learning techniques can be effective for an industrial application.

Download Full-text

Zero-Shot Human Activity Recognition Using Non-Visual Sensors

Sensors ◽

10.3390/s20030825 ◽

2020 ◽

Vol 20 (3) ◽

pp. 825 ◽

Cited By ~ 3

Author(s):

Fadi Al Machot ◽

Mohammed R. Elkobaisi ◽

Kyandoghere Kyamakya

Keyword(s):

Activity Recognition ◽

High Performance ◽

Real Life ◽

Machine Learning Algorithms ◽

Training Data ◽

Sensor Data ◽

Sensor Technology ◽

Training Dataset ◽

Data Sets ◽

Data Set

Due to significant advances in sensor technology, studies towards activity recognition have gained interest and maturity in the last few years. Existing machine learning algorithms have demonstrated promising results by classifying activities whose instances have been already seen during training. Activity recognition methods based on real-life settings should cover a growing number of activities in various domains, whereby a significant part of instances will not be present in the training data set. However, to cover all possible activities in advance is a complex and expensive task. Concretely, we need a method that can extend the learning model to detect unseen activities without prior knowledge regarding sensor readings about those previously unseen activities. In this paper, we introduce an approach to leverage sensor data in discovering new unseen activities which were not present in the training set. We show that sensor readings can lead to promising results for zero-shot learning, whereby the necessary knowledge can be transferred from seen to unseen activities by using semantic similarity. The evaluation conducted on two data sets extracted from the well-known CASAS datasets show that the proposed zero-shot learning approach achieves a high performance in recognizing unseen (i.e., not present in the training dataset) new activities.

Download Full-text