Machine Learning Based Predictive Action on Categorical Non-Sequential Data

Background: With the advent of data analysis and machine learning, there is a growing impetus of analyzing and generating models on historic data. The data comes in numerous forms and shapes with an abundance of challenges. The most sorted form of data for analysis is the numerical data. With the plethora of algorithms and tools it is quite manageable to deal with such data. Another form of data is of categorical nature, which is subdivided into, ordinal (order wise) and nominal (number wise). This data can be broadly classified as Sequential and Non-Sequential. Sequential data analysis is easier to preprocess using algorithms. Objective: The challenge of applying machine learning algorithms on categorical data of nonsequential nature is dealt in this paper. Methods: Upon implementing several data analysis algorithms on such data, we end up getting a biased result, which makes it impossible to generate a reliable predictive model. In this paper, we will address this problem by walking through a handful of techniques which during our research helped us in dealing with a large categorical data of non-sequential nature. In subsequent sections, we will discuss the possible implementable solutions and shortfalls of these techniques. Results: The methods are applied to sample datasets available in public domain and the results with respect to accuracy of classification are satisfactory. Conclusion: The best pre-processing technique we observed in our research is one hot encoding, which facilitates breaking down the categorical features into binary and feeding it into an Algorithm to predict the outcome. The example that we took is not abstract but it is a real – time production services dataset, which had many complex variations of categorical features. Our Future work includes creating a robust model on such data and deploying it into industry standard applications.

Download Full-text

Marketing customer response scoring model based on machine learning data analysis

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189484 ◽

2020 ◽

pp. 1-11

Author(s):

Tang Yan ◽

Li Pengfei

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Data Extraction ◽

Machine Learning Algorithms ◽

Customer Relationship ◽

Customer Data ◽

Modeling And Analysis ◽

Scoring Model ◽

Model Based ◽

Learning Data

In marketing, problems such as the increase in customer data, the increase in the difficulty of data extraction and access, the lack of reliability and accuracy of data analysis, the slow efficiency of data processing, and the inability to effectively transform massive amounts of data into valuable information have become increasingly prominent. In order to study the effect of customer response, based on machine learning algorithms, this paper constructs a marketing customer response scoring model based on machine learning data analysis. In the context of supplier customer relationship management, this article analyzes the supplier’s precision marketing status and existing problems and uses its own development and management characteristics to improve marketing strategies. Moreover, this article uses a combination of database and statistical modeling and analysis to try to establish a customer response scoring model suitable for supplier precision marketing. In addition, this article conducts research and analysis with examples. From the research results, it can be seen that the performance of the model constructed in this article is good.

Download Full-text

PREDICTION AND ANALYSIS OF GEOMECHANICAL PROPERTIES OF JIMUSAER SHALE USING A MACHINE LEARNING APPROACH

10.30632/spwla-2021-0089 ◽

2021 ◽

Author(s):

Lianteng Song ◽

◽

Zhonghua Liu ◽

Chaoliu Li ◽

Congqian Ning ◽

...

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Gamma Ray ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Training Data ◽

Sequential Data ◽

Log Data ◽

Geomechanical Properties ◽

Single Well

Geomechanical properties are essential for safe drilling, successful completion, and exploration of both conven-tional and unconventional reservoirs, e.g. deep shale gas and shale oil. Typically, these properties could be calcu-lated from sonic logs. However, in shale reservoirs, it is time-consuming and challenging to obtain reliable log-ging data due to borehole complexity and lacking of in-formation, which often results in log deficiency and high recovery cost of incomplete datasets. In this work, we propose the bidirectional long short-term memory (BiL-STM) which is a supervised neural network algorithm that has been widely used in sequential data-based pre-diction to estimate geomechanical parameters. The pre-diction from log data can be conducted from two differ-ent aspects. 1) Single-Well prediction, the log data from a single well is divided into training data and testing data for cross validation; 2) Cross-Well prediction, a group of wells from the same geographical region are divided into training set and testing set for cross validation, as well. The logs used in this work were collected from 11 wells from Jimusaer Shale, which includes gamma ray, bulk density, resistivity, and etc. We employed 5 vari-ous machine learning algorithms for comparison, among which BiLSTM showed the best performance with an R-squared of more than 90% and an RMSE of less than 10. The predicted results can be directly used to calcu-late geomechanical properties, of which accuracy is also improved in contrast to conventional methods.

Download Full-text

Machine learning algorithms for smart data analysis in the Internet of things: an overview

Intelligent Wireless Communications ◽

10.1049/pbte094e_ch12 ◽

2021 ◽

pp. 307-327

Author(s):

Mohammed H. Alsharif ◽

Anabi Hilary Kelechi ◽

Imran Khan ◽

Mahmoud A. Albreem ◽

Abu Jahid ◽

...

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Internet Of Things ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

The Internet ◽

Smart Data ◽

The Internet Of Things

Download Full-text

Evaluation of Interstate Work Zone Mobility using Probe Vehicle Data and Machine Learning Techniques

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198119827936 ◽

2019 ◽

Vol 2673 (2) ◽

pp. 811-822 ◽

Cited By ~ 1

Author(s):

Mohsen Kamyab ◽

Stephen Remias ◽

Erfan Najmi ◽

Kerrick Hood ◽

Mustafa Al-Akshar ◽

...

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Work Zone ◽

Third Party ◽

Machine Learning Techniques ◽

Work Zones ◽

Vehicle Data ◽

Gps Devices ◽

Future Work ◽

The Impact

According to the Federal Highway Administration (FHWA), US work zones on freeways account for nearly 24% of nonrecurring freeway delays and 10% of overall congestion. Historically, there have been limited scalable datasets to investigate the specific causes of congestion due to work zones or to improve work zone planning processes to characterize the impact of work zone congestion. In recent years, third-party data vendors have provided scalable speed data from Global Positioning System (GPS) devices and cell phones which can be used to characterize mobility on all roadways. Each work zone has unique characteristics and varying mobility impacts which are predicted during the planning and design phases, but can realistically be quite different from what is ultimately experienced by the traveling public. This paper uses these datasets to introduce a scalable Work Zone Mobility Audit (WZMA) template. Additionally, the paper uses metrics developed for individual work zones to characterize the impact of more than 250 work zones varying in length and duration from Southeast Michigan. The authors make recommendations to work zone engineers on useful data to collect for improving the WZMA. As more systematic work zone data are collected, improved analytical assessment techniques, such as machine learning processes, can be used to identify the factors that will predict future work zone impacts. The paper concludes by demonstrating two machine learning algorithms, Random Forest and XGBoost, which show historical speed variation is a critical component when predicting the mobility impact of work zones.

Download Full-text

A Predictive Tool For Grid Data Analysis Using Machine Learning Algorithms

2020 10th Annual Computing and Communication Workshop and Conference (CCWC) ◽

10.1109/ccwc47524.2020.9031265 ◽

2020 ◽

Author(s):

David Penn ◽

Vinitha Hannah Subburaj ◽

Anitha Sarah Subburaj ◽

Mark Harral

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Grid Data ◽

Predictive Tool

Download Full-text

Forest Cover Types Classification Based on Online Machine Learning on Distributed Cloud Computing Platforms of Storm and SAMOA

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.955-959.3803 ◽

2014 ◽

Vol 955-959 ◽

pp. 3803-3812

Author(s):

Guang Di Li ◽

Guo Yin Wang ◽

Xue Rui Zhang ◽

Wei Hui Deng ◽

Fan Zhang

Keyword(s):

Machine Learning ◽

Forest Cover ◽

Stream Processing ◽

Processing Technique ◽

Machine Learning Algorithms ◽

Learning Tasks ◽

Hoeffding Tree ◽

Computing Platforms ◽

Processing Platform ◽

Forest Cover Types

Storm is the most popular realtime stream processing platform, which can be used to deal with online machine learning. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. SAMOA includes distributed algorithms for the most common machine learning tasks like Mahout for Hadoop. SAMOA is both a platform and a library. In this paper, Forest cover types, a large benchmaking dataset available at the UCI KDD Archive is used as the data stream source. Vertical Hoeffding Tree, a parallelizing streaming decision tree induction for distributed enviroment, which is incorporated in SAMOA API is applied on Storm platform. This study compared stream prcessing technique for predicting forest cover types from cartographic variables with traditional classic machine learning algorithms applied on this dataset. The test then train method used in this system is totally different from the traditional train then test. The results of the stream processing technique indicated that it’s output is aymptotically nearly identical to that of a conventional learner, but the model derived from this system is totally scalable, real-time, capable of dealing with evolving streams and insensitive to stream ordering.

Download Full-text

Smartphone as a monitoring tool for bipolar disorder: a systematic review including data analysis, machine learning algorithms and predictive modelling

International Journal of Medical Informatics ◽

10.1016/j.ijmedinf.2020.104131 ◽

2020 ◽

Vol 138 ◽

pp. 104131 ◽

Cited By ~ 2

Author(s):

Anna Z. Antosik-Wójcińska ◽

Monika Dominiak ◽

Magdalena Chojnacka ◽

Katarzyna Kaczmarek-Majer ◽

Karol R. Opara ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Bipolar Disorder ◽

Data Analysis ◽

Learning Algorithms ◽

Predictive Modelling ◽

Machine Learning Algorithms ◽

Monitoring Tool

Download Full-text

ECG data analysis and heart disease prediction using machine learning algorithms

2019 IEEE Region 10 Symposium (TENSYMP) ◽

10.1109/tensymp46218.2019.8971374 ◽

2019 ◽

Author(s):

Sushmita Roy Tithi ◽

Afifa Aktar ◽

Fahimul Aleem ◽

Amitabha Chakrabarty

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Data Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Disease Prediction ◽

Ecg Data

Download Full-text

Eagle View: An Abstract Evaluation of Machine Learning Algorithms based on Data Properties

10.36227/techrxiv.14459361.v1 ◽

2021 ◽

Author(s):

Dhairya Vyas

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Data ◽

Learning Algorithms ◽

Numerical Data ◽

Machine Learning Algorithms ◽

Series Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Almost All

In terms of Machine Learning, the majority of the data can be grouped into four categories: numerical data, category data, time-series data, and text. We use different classifiers for different data properties, such as the Supervised; Unsupervised; and Reinforcement. Each Categorises has classifier we have tested almost all machine learning methods and make analysis among them.

Download Full-text

Not All Attributes are Created Equal: dX -Private Mechanisms for Linear Queries

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2020-0007 ◽

2020 ◽

Vol 2020 (1) ◽

pp. 103-125

Author(s):

Parameswaran Kamalaruban ◽

Victor Perrier ◽

Hassan Jameel Asghar ◽

Mohamed Ali Kaafar

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Differential Privacy ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Trade Off ◽

Systematic Procedure ◽

Privacy Budget ◽

Sensitivity Vector

AbstractDifferential privacy provides strong privacy guarantees simultaneously enabling useful insights from sensitive datasets. However, it provides the same level of protection for all elements (individuals and attributes) in the data. There are practical scenarios where some data attributes need more/less protection than others. In this paper, we consider dX -privacy, an instantiation of the privacy notion introduced in [6], which allows this flexibility by specifying a separate privacy budget for each pair of elements in the data domain. We describe a systematic procedure to tailor any existing differentially private mechanism that assumes a query set and a sensitivity vector as input into its dX -private variant, specifically focusing on linear queries. Our proposed meta procedure has broad applications as linear queries form the basis of a range of data analysis and machine learning algorithms, and the ability to define a more flexible privacy budget across the data domain results in improved privacy/utility tradeoff in these applications. We propose several dX -private mechanisms, and provide theoretical guarantees on the trade-off between utility and privacy. We also experimentally demonstrate the effectiveness of our procedure, by evaluating our proposed dX -private Laplace mechanism on both synthetic and real datasets using a set of randomly generated linear queries.

Download Full-text