Survey on Technique and User Profiling in Unsupervised Machine Learning Method

In order to generate precise behavioural patterns or user segmentation, organisations often struggle with pulling information from data and choosing suitable Machine Learning (ML) techniques. Furthermore, many marketing teams are unfamiliar with data-driven classification methods. The goal of this research is to provide a framework that outlines the Unsupervised Machine Learning (UML) methods for User-Profiling (UP) based on essential data attributes. A thorough literature study was undertaken on the most popular UML techniques and their dataset attributes needs. For UP, a structure is developed that outlines several UML techniques. In terms of data size and dimensions, it offers two-stage clustering algorithms for category, quantitative, and mixed types of datasets. The clusters are determined in the first step using a multilevel or model-based classification method. Cluster refining is done in the second step using a non-hierarchical clustering technique. Academics and professionals may use the framework to figure out which UML techniques are best for creating strong profiles or data-driven user segmentation.

Download Full-text

Solar farm voltage anomaly detection using high-resolution μPMU data-driven unsupervised machine learning

Applied Energy ◽

10.1016/j.apenergy.2021.117656 ◽

2021 ◽

Vol 303 ◽

pp. 117656

Author(s):

Maitreyee Dey ◽

Soumya Prakash Rana ◽

Clarke V. Simmons ◽

Sandra Dudley

Keyword(s):

Machine Learning ◽

High Resolution ◽

Anomaly Detection ◽

Data Driven ◽

Unsupervised Machine Learning

Download Full-text

Ammonoid Taxonomy with Supervised and Unsupervised Machine Learning Algorithms

10.31233/osf.io/ewkx9 ◽

2021 ◽

Author(s):

Floe Foxon

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Learning Algorithms ◽

Clustering Algorithms ◽

Measurement Data ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Unsupervised Machine Learning

Ammonoid identification is crucial to biostratigraphy, systematic palaeontology, and evolutionary biology, but may prove difficult when shell features and sutures are poorly preserved. This necessitates novel approaches to ammonoid taxonomy. This study aimed to taxonomize ammonoids by their conch geometry using supervised and unsupervised machine learning algorithms. Ammonoid measurement data (conch diameter, whorl height, whorl width, and umbilical width) were taken from the Paleobiology Database (PBDB). 11 species with ≥50 specimens each were identified providing N=781 total unique specimens. Naive Bayes, Decision Tree, Random Forest, Gradient Boosting, K-Nearest Neighbours, and Support Vector Machine classifiers were applied to the PBDB data with a 5x5 nested cross-validation approach to obtain unbiased generalization performance estimates across a grid search of algorithm parameters. All supervised classifiers achieved ≥70% accuracy in identifying ammonoid species, with Naive Bayes demonstrating the least over-fitting. The unsupervised clustering algorithms K-Means, DBSCAN, OPTICS, Mean Shift, and Affinity Propagation achieved Normalized Mutual Information scores of ≥0.6, with the centroid-based methods having most success. This presents a reasonably-accurate proof-of-concept approach to ammonoid classification which may assist identification in cases where more traditional methods are not feasible.

Download Full-text

Using Real-Time Data and Unsupervised Machine Learning Techniques to Study Large-Scale Spatio–Temporal Characteristics of Wastewater Discharges and their Influence on Surface Water Quality in the Yangtze River Basin

Water ◽

10.3390/w11061268 ◽

2019 ◽

Vol 11 (6) ◽

pp. 1268 ◽

Cited By ~ 3

Author(s):

Zhenzhen Di ◽

Miao Chang ◽

Peikun Guo ◽

Yang Li ◽

Yin Chang

Keyword(s):

Machine Learning ◽

Surface Water ◽

Real Time ◽

Yangtze River Basin ◽

Clustering Algorithms ◽

Machine Learning Techniques ◽

Unsupervised Machine Learning ◽

Learning Techniques ◽

The Yangtze River Basin ◽

Spatio Temporal

Most worldwide industrial wastewater, including in China, is still directly discharged to aquatic environments without adequate treatment. Because of a lack of data and few methods, the relationships between pollutants discharged in wastewater and those in surface water have not been fully revealed and unsupervised machine learning techniques, such as clustering algorithms, have been neglected in related research fields. In this study, real-time monitoring data for chemical oxygen demand (COD), ammonia nitrogen (NH3-N), pH, and dissolved oxygen in the wastewater discharged from 2213 factories and in the surface water at 18 monitoring sections (sites) in 7 administrative regions in the Yangtze River Basin from 2016 to 2017 were collected and analyzed by the partitioning around medoids (PAM) and expectation–maximization (EM) clustering algorithms, Welch t-test, Wilcoxon test, and Spearman correlation. The results showed that compared with the spatial cluster comprising unpolluted sites, the spatial cluster comprised heavily polluted sites where more wastewater was discharged had relatively high COD (>100 mg L−1) and NH3-N (>6 mg L−1) concentrations and relatively low pH (<6) from 15 industrial classes that respected the different discharge limits outlined in the pollutant discharge standards. The results also showed that the economic activities generating wastewater and the geographical distribution of the heavily polluted wastewater changed from 2016 to 2017, such that the concentration ranges of pollutants in discharges widened and the contributions from some emerging enterprises became more important. The correlations between the quality of the wastewater and the surface water strengthened as the whole-year data sets were reduced to the heavily polluted periods by the EM clustering and water quality evaluation. This study demonstrates how unsupervised machine learning algorithms play an objective and effective role in data mining real-time monitoring information and highlighting spatio–temporal relationships between pollutants in wastewater discharges and surface water to support scientific water resource management.

Download Full-text

A New Data-Driven Seismic Interpretation Workflow Using Unsupervised Machine Learning and Non-Local Trace Matching

10.3997/2214-4609.201901509 ◽

2019 ◽

Author(s):

A.J. Bugge ◽

J.E. Lie ◽

A.K. Evensen ◽

S. Clark

Keyword(s):

Machine Learning ◽

Seismic Interpretation ◽

Data Driven ◽

Unsupervised Machine Learning ◽

Non Local

Download Full-text

Unsupervised machine learning: clustering algorithms

Machine Learning Guide for Oil and Gas Using Python ◽

10.1016/b978-0-12-821929-4.00002-0 ◽

2021 ◽

pp. 125-168

Author(s):

Hoss Belyadi ◽

Alireza Haghighat

Keyword(s):

Machine Learning ◽

Clustering Algorithms ◽

Unsupervised Machine Learning

Download Full-text

Unsupervised Learning for Product Use Activity Recognition: An Exploratory Study of a “Chatty Device”

Sensors ◽

10.3390/s21154991 ◽

2021 ◽

Vol 21 (15) ◽

pp. 4991

Author(s):

Mike Lakoju ◽

Nemitari Ajienka ◽

M. Ahmadieh Khanesar ◽

Pete Burnap ◽

David T. Branson

Keyword(s):

Machine Learning ◽

Activity Recognition ◽

Sampling Rate ◽

Machine Learning Algorithms ◽

Data Driven ◽

Sensor Data ◽

Unsupervised Machine Learning ◽

Fuzzy C Means ◽

Product Use ◽

Fuzzy C Means Algorithm

To create products that are better fit for purpose, manufacturers require new methods for gaining insights into product experience in the wild at scale. “Chatty Factories” is a concept that explores the transformative potential of placing IoT-enabled data-driven systems at the core of design and manufacturing processes, aligned to the Industry 4.0 paradigm. In this paper, we propose a model that enables new forms of agile engineering product development via “chatty” products. Products relay their “experiences” from the consumer world back to designers and product engineers through the mediation provided by embedded sensors, IoT, and data-driven design tools. Our model aims to identify product “experiences” to support the insights into product use. To this end, we create an experiment to: (i) collect sensor data at 100 Hz sampling rate from a “Chatty device” (device with sensors) for six common everyday activities that drive produce experience: standing, walking, sitting, dropping and picking up of the device, placing the device stationary on a side table, and a vibrating surface; (ii) pre-process and manually label the product use activity data; (iii) compare a total of four Unsupervised Machine Learning models (three classic and the fuzzy C-means algorithm) for product use activity recognition for each unique sensor; and (iv) present and discuss our findings. The empirical results demonstrate the feasibility of applying unsupervised machine learning algorithms for clustering product use activity. The highest obtained F-measure is 0.87, and MCC of 0.84, when the Fuzzy C-means algorithm is applied for clustering, outperforming the other three algorithms applied.

Download Full-text

Using country-level variables to classify countries according to the number of confirmed COVID-19 cases: An unsupervised machine learning approach

Wellcome Open Research ◽

10.12688/wellcomeopenres.15819.2 ◽

2020 ◽

Vol 5 ◽

pp. 56 ◽

Cited By ~ 1

Author(s):

Rodrigo M. Carrillo-Larco ◽

Manuel Castillo-Cara

Keyword(s):

Machine Learning ◽

Case Fatality Rate ◽

Learning Algorithms ◽

Case Fatality ◽

Machine Learning Algorithms ◽

Data Driven ◽

Mortality Data ◽

Fatality Rate ◽

Unsupervised Machine Learning ◽

Country Level

Background: The COVID-19 pandemic has attracted the attention of researchers and clinicians whom have provided evidence about risk factors and clinical outcomes. Research on the COVID-19 pandemic benefiting from open-access data and machine learning algorithms is still scarce yet can produce relevant and pragmatic information. With country-level pre-COVID-19-pandemic variables, we aimed to cluster countries in groups with shared profiles of the COVID-19 pandemic. Methods: Unsupervised machine learning algorithms (k-means) were used to define data-driven clusters of countries; the algorithm was informed by disease prevalence estimates, metrics of air pollution, socio-economic status and health system coverage. Using the one-way ANOVA test, we compared the clusters in terms of number of confirmed COVID-19 cases, number of deaths, case fatality rate and order in which the country reported the first case. Results: The model to define the clusters was developed with 155 countries. The model with three principal component analysis parameters and five or six clusters showed the best ability to group countries in relevant sets. There was strong evidence that the model with five or six clusters could stratify countries according to the number of confirmed COVID-19 cases (p<0.001). However, the model could not stratify countries in terms of number of deaths or case fatality rate. Conclusions: A simple data-driven approach using available global information before the COVID-19 pandemic, seemed able to classify countries in terms of the number of confirmed COVID-19 cases. The model was not able to stratify countries based on COVID-19 mortality data.

Download Full-text

A data-driven air quality assessment method based on unsupervised machine learning and median statistical analysis: The case of China

Journal of Cleaner Production ◽

10.1016/j.jclepro.2021.129531 ◽

2021 ◽

pp. 129531

Author(s):

Xiaoxia Wang ◽

Luqi Wang ◽

Yuanyuan Liu ◽

Sangen Hu ◽

Xuezhen Liu ◽

...

Keyword(s):

Machine Learning ◽

Statistical Analysis ◽

Air Quality ◽

Quality Assessment ◽

Assessment Method ◽

Data Driven ◽

Unsupervised Machine Learning ◽

Air Quality Assessment ◽

Quality Assessment Method

Download Full-text

Data-driven analysis of molten-salt nanofluids for specific heat enhancement using unsupervised machine learning methodologies

Solar Energy ◽

10.1016/j.solener.2021.09.022 ◽

2021 ◽

Vol 227 ◽

pp. 447-456

Author(s):

Dipti Ranjan Parida ◽

Nikhil Dani ◽

Saptarshi Basu

Keyword(s):

Machine Learning ◽

Specific Heat ◽

Molten Salt ◽

Data Driven ◽

Unsupervised Machine Learning

Download Full-text

Using unsupervised machine learning to model tax practice learning theory

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.4.13019 ◽

2018 ◽

Vol 7 (2.4) ◽

pp. 109 ◽

Cited By ~ 1

Author(s):

Alfred Howard Miller

Keyword(s):

Machine Learning ◽

Factor Model ◽

Bachelor's Degree ◽

Data Driven ◽

Unsupervised Machine Learning ◽

Learning Framework ◽

Reporting Requirements ◽

Data Driven Approach ◽

Learning Research ◽

Tax Practice

The aim of this study was to utilize unsupervised machine learning framework to explore a dataset comprised of assessed output by Bachelors of Business, Taxation learners over four successive semesters. The researcher sought to motivate deployment of an evidence-supported, data-driven approach to understand the scope of student learning from a bachelor’s degree in business class taxation class, as a tool for accreditation reporting purposes. Outcomes from the data analysis identified four factors; two related to tax and two related to learning. These factors are, tax theory, and tax practice, along with practical learning and theoretical learning. Research motivated a grounded theory paradigm that explained taxation class learner’s scope of acquired knowledge. The resulting four factor model is a result of the study. The emergent paradigm further explains accounting student’s readiness for career success upon graduation and provides a novel way to meet outcomes reporting requirements mandated by programmatic business accreditors such as required by the Accreditation Council for Business Schools and Programs (ACBSP).

Download Full-text