Image Processing for Public Health Surveillance of Tobacco Point-of-Sale Advertising: A Machine Learning-Based Methodology (Preprint)

BACKGROUND With a rapidly evolving tobacco retail environment, it is increasingly necessary to understand the point of sale (POS) advertising environment as part of tobacco surveillance and control. Advances in machine learning and image processing suggest the ability for more efficient and more nuanced data capture than previously available. OBJECTIVE To employ machine learning algorithms to discover both the presence of tobacco advertising in photographs of tobacco POS advertising, as well as their location in the photograph. METHODS We first collected images of the interiors of tobacco retailers in West Virginia and the District of Columbia during 2016 and 2018. The clearest photos were selected and used to create a training and test data set. We then used a pre-trained image classification network model, Inception V3,to discover the presence of tobacco logos, as well as a unified object detection system, You Only Look Once (YOLO), to identify logo locations. RESULTS Our model was successful in identifying the presence of advertising within images, with a classification accuracy of over 75% for 8 of the 42 brands. Discovering the location of logos within a given photo was more challenging due to the relatively small training data set, resulting in a mean Average Precision (mAP) score of 72% and Intersection over Union (IOU) of 62%. CONCLUSIONS Our research provides evidence for a novel methodological approach that tobacco researchers and other public health practitioners can apply in the collection and processing of data for tobacco or other POS surveillance efforts. The resulting surveillance information can inform policy adoption, implementation, and enforcement. Limitations notwithstanding, our analysis shows the promise of using machine learning as part of a suite of tools to understand the tobacco retail environment, make policy recommendations, and design public health interventions at the municipal or other jurisdictional scale.

Download Full-text

Intrusion Detection System for Large Scale Data using Machine Learning Algorithms

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f7971.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 706-711

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Defense Mechanism ◽

Detection System ◽

Learning Algorithms ◽

Background Knowledge ◽

Machine Learning Algorithms ◽

Training Data ◽

Data Set

To provide security to internet assets, Intrusion Detection System (IDS) is most essential constituent. Due to various network attacks it is very hard to detect malicious activities from remote user as well as remote machines. In such a manner it is mandatory to analyze such activities which are normal or malicious. Due to insufficient background knowledge of system it is hard to detect malicious activities of system. In this work we proposed intrusion detection system using various soft computing algorithms, the system has categorized into three different sections, in first section we execute the data preprocessing as well as generate background knowledge of system according to two training data set as well as combination genetic algorithm. Once the background knowledge has generated system executes for prevention mode. In prevention mode basically it works for defense mechanism from various networks and host attacks. System uses two data sets which contain around 42 attributes. The system is able to support for NIDS as well as HIDS respectively. The result section will show how proposed system is better than classical machine learning algorithms. With the help of various comparative graphs as well as detection rate of systems we conclude proposed system provides the drastic supervision in vulnerable network environment. The average accuracy of proposed system is 100% for DOS attacks as well as around more than 90% plus accuracy for other as well as unknown attacks respectively.

Download Full-text

Can Short and Partial Observations Reduce Model Error and Facilitate Machine Learning Prediction?

Entropy ◽

10.3390/e22101075 ◽

2020 ◽

Vol 22 (10) ◽

pp. 1075

Author(s):

Nan Chen

Keyword(s):

Machine Learning ◽

Model Error ◽

Machine Learning Algorithms ◽

Training Data ◽

Conditional Sampling ◽

Data Set ◽

Partial Observations ◽

Sampling Algorithm ◽

Highly Nonlinear ◽

Non Gaussian

Predicting complex nonlinear turbulent dynamical systems is an important and practical topic. However, due to the lack of a complete understanding of nature, the ubiquitous model error may greatly affect the prediction performance. Machine learning algorithms can overcome the model error, but they are often impeded by inadequate and partial observations in predicting nature. In this article, an efficient and dynamically consistent conditional sampling algorithm is developed, which incorporates the conditional path-wise temporal dependence into a two-step forward-backward data assimilation procedure to sample multiple distinct nonlinear time series conditioned on short and partial observations using an imperfect model. The resulting sampled trajectories succeed in reducing the model error and greatly enrich the training data set for machine learning forecasts. For a rich class of nonlinear and non-Gaussian systems, the conditional sampling is carried out by solving a simple stochastic differential equation, which is computationally efficient and accurate. The sampling algorithm is applied to create massive training data of multiscale compressible shallow water flows from highly nonlinear and indirect observations. The resulting machine learning prediction significantly outweighs the imperfect model forecast. The sampling algorithm also facilitates the machine learning forecast of a highly non-Gaussian climate phenomenon using extremely short observations.

Download Full-text

Identification of Leukemia Subtypes from Microscopic Images Using Convolutional Neural Network

Diagnostics ◽

10.3390/diagnostics9030104 ◽

2019 ◽

Vol 9 (3) ◽

pp. 104 ◽

Cited By ~ 11

Author(s):

Ahmed ◽

Yigit ◽

Isik ◽

Alpkocak

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Leukemia Data

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.

Download Full-text

Rotor Unbalance Kind and Severity Identification by Current Signature Analysis with Adaptative Update to Multiclass Machine Learning Algorithms

Studies in Engineering and Technology ◽

10.11114/set.v8i1.5213 ◽

2021 ◽

Vol 8 (1) ◽

pp. 28

Author(s):

S. L. Ávila ◽

H. M. Schaberle ◽

S. Youssef ◽

F. S. Pacheco ◽

C. A. Penz

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Signature Analysis ◽

Data Set ◽

Learning Techniques ◽

Environmental Variations ◽

Current Signature

The health of a rotating electric machine can be evaluated by monitoring electrical and mechanical parameters. As more information is available, it easier can become the diagnosis of the machine operational condition. We built a laboratory test bench to study rotor unbalance issues according to ISO standards. Using the electric stator current harmonic analysis, this paper presents a comparison study among Support-Vector Machines, Decision Tree classifies, and One-vs-One strategy to identify rotor unbalance kind and severity problem – a nonlinear multiclass task. Moreover, we propose a methodology to update the classifier for dealing better with changes produced by environmental variations and natural machinery usage. The adaptative update means to update the training data set with an amount of recent data, saving the entire original historical data. It is relevant for engineering maintenance. Our results show that the current signature analysis is appropriate to identify the type and severity of the rotor unbalance problem. Moreover, we show that machine learning techniques can be effective for an industrial application.

Download Full-text

Performance of Machine Learning Algorithms and Diversity in Data

MATEC Web of Conferences ◽

10.1051/matecconf/201821004019 ◽

2018 ◽

Vol 210 ◽

pp. 04019 ◽

Cited By ~ 1

Author(s):

Hyontai SUG

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Real World ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Real World Data ◽

Random Data ◽

Data Set ◽

World Data

Recent world events in go games between human and artificial intelligence called AlphaGo showed the big advancement in machine learning technologies. While AlphaGo was trained using real world data, AlphaGo Zero was trained using massive random data, and the fact that AlphaGo Zero won AlphaGo completely revealed that diversity and size in training data is important for better performance for the machine learning algorithms, especially in deep learning algorithms of neural networks. On the other hand, artificial neural networks and decision trees are widely accepted machine learning algorithms because of their robustness in errors and comprehensibility respectively. In this paper in order to prove that diversity and size in data are important factors for better performance of machine learning algorithms empirically, the two representative algorithms are used for experiment. A real world data set called breast tissue was chosen, because the data set consists of real numbers that is very good property for artificial random data generation. The result of the experiment proved the fact that the diversity and size of data are very important factors for better performance.

Download Full-text

Design of a Fracture Detection System based on Deep Program in a Convolutional Neural Network

Webology ◽

10.14704/web/v18i2/web18336 ◽

2021 ◽

Vol 18 (2) ◽

pp. 509-518

Author(s):

Payman Hussein Hussan ◽

Syefy Mohammed Mangj Al-Razoky ◽

Hasanain Mohammed Manji Al-Rzoky

Keyword(s):

Neural Network ◽

Machine Learning ◽

Bone Fractures ◽

Detection System ◽

Training Data ◽

Learning Models ◽

Fracture Detection ◽

Data Set ◽

Final Fracture

This paper presents an efficient method for finding fractures in bones. For this purpose, the pre-processing set includes increasing the quality of images, removing additional objects, removing noise and rotating images. The input images then enter the machine learning phase to detect the final fracture. At this stage, a Convolutional Neural Networks is created by Genetic Programming (GP). In this way, learning models are implemented in the form of GP programs. And evolve during the evolution of this program. Then finally the best program for classifying incoming images is selected. The data set in this work is divided into training and test friends who have nothing in common. The ratio of training data to test is equal to 80 to 20. Finally, experimental results show good results for the proposed method for bone fractures.

Download Full-text

Deep Neural Network for Multi-Class Prediction of Student Performance in Educational Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2155.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 5073-5081

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Student Performance ◽

Activation Function ◽

Machine Learning Algorithms ◽

Training Data ◽

Fine Tuning ◽

Academic Excellence ◽

Data Set

Prediction of student performance is the significant part in processing the educational data. Machine learning algorithms are leading the role in this process. Deep learning is one of the important concepts of machine learning algorithm. In this paper, we applied the deep learning technique for prediction of the academic excellence of the students using R Programming. Keras and Tensorflow libraries utilized for making the model using neural network on the Kaggle dataset. The data is separated into testing data training data set. Plot the neural network model using neuralnet method and created the Deep Learning model using two hidden layers using ReLu activation function and one output layer using softmax activation function. After fine tuning process until the stable changes; this model produced accuracy as 85%.

Download Full-text

Earthquake Prediction using Machine Learning Algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e9110.018620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 4684-4688

Keyword(s):

Machine Learning ◽

Structural Damage ◽

Data Science ◽

Learning Algorithm ◽

Economic Loss ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Science Data ◽

Data Set

Per the statistics received from BBC, data varies for every earthquake occurred till date. Approximately, up to thousands are dead, about 50,000 are injured, around 1-3 Million are dislocated, while a significant amount go missing and homeless. Almost 100% structural damage is experienced. It also affects the economic loss, varying from 10 to 16 million dollars. A magnitude corresponding to 5 and above is classified as deadliest. The most life-threatening earthquake occurred till date took place in Indonesia where about 3 million were dead, 1-2 million were injured and the structural damage accounted to 100%. Hence, the consequences of earthquake are devastating and are not limited to loss and damage of living as well as nonliving, but it also causes significant amount of change-from surrounding and lifestyle to economic. Every such parameter desiderates into forecasting earthquake. A couple of minutes’ notice and individuals can act to shield themselves from damage and demise; can decrease harm and monetary misfortunes, and property, characteristic assets can be secured. In current scenario, an accurate forecaster is designed and developed, a system that will forecast the catastrophe. It focuses on detecting early signs of earthquake by using machine learning algorithms. System is entitled to basic steps of developing learning systems along with life cycle of data science. Data-sets for Indian sub-continental along with rest of the World are collected from government sources. Pre-processing of data is followed by construction of stacking model that combines Random Forest and Support Vector Machine Algorithms. Algorithms develop this mathematical model reliant on “training data-set”. Model looks for pattern that leads to catastrophe and adapt to it in its building, so as to settle on choices and forecasts without being expressly customized to play out the task. After forecast, we broadcast the message to government officials and across various platforms. The focus of information to obtain is keenly represented by the 3 factors – Time, Locality and Magnitude.

Download Full-text

Anomaly Detection using Optimized Features using Genetic Algorithm and MultiEnsemble Classifier

IJOSTHE ◽

10.24113/ojssports.v5i6.79 ◽

2018 ◽

Vol 5 (6) ◽

pp. 7

Author(s):

Apoorva Deshpande ◽

Ramnaresh Sharma

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Intrusion Detection ◽

Anomaly Detection ◽

Detection System ◽

Research Work ◽

Ensemble Classifier ◽

Machine Learning Algorithms ◽

Data Set ◽

Machine Learning Classification

Anomaly detection system plays an important role in network security. Anomaly detection or intrusion detection model is a predictive model used to predict the network data traffic as normal or intrusion. Machine Learning algorithms are used to build accurate models for clustering, classification and prediction. In this paper classification and predictive models for intrusion detection are built by using machine learning classification algorithms namely Random Forest. These algorithms are tested with KDD-99 data set. In this research work the model for anomaly detection is based on normalized reduced feature and multilevel ensemble classifier. The work is performed in divided into two stages. In the first stage data is normalized using mean normalization. In second stage genetic algorithm is used to reduce number of features and further multilevel ensemble classifier is used for classification of data into different attack groups. From result analysis it is analysed that with reduced feature intrusion can be classified more efficiently.

Download Full-text

Optimization of IDS using Filter-Based Feature Selection and Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b8278.1210220 ◽

2020 ◽

Vol 10 (2) ◽

pp. 96-102

Author(s):

Neha Sharma ◽

Harsh Vardhan Bhandari ◽

Narendra Singh Yadav ◽

Harsh Vardhan Jonathan Shroff

Keyword(s):

Machine Learning ◽

Secure Communication ◽

Detection System ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Data Set ◽

Normal Behavior ◽

Active Processes ◽

High Level ◽

Use Of Internet

Nowadays it is imperative to maintain a high level of security to ensure secure communication of information between various institutions and organizations. With the growing use of internet over the years, the number of attacks over the internet have escalated. A powerful Intrusion Detection System (IDS) is required to ensure the security of a network. The aim of an IDS is to monitor the active processes in a network and to detect any deviation from the normal behavior of the system. When it comes to machine learning, optimization is the process of obtaining the maximum accuracy from a model. Optimization is vital for IDSs in order to predict a wide variety of attacks with utmost accuracy. The effectiveness of an IDS is dependent on its ability to correctly predict and classify any anomaly faced by a computer system. During the last two decades, KDD_CUP_99 has been the most widely used data set to evaluate the performance of such systems. In this study, we will apply different Machine Learning techniques on this data set and see which technique yields the best results.

Download Full-text