International Journal of Data Mining & Knowledge Management Process
Latest Publications


TOTAL DOCUMENTS

236
(FIVE YEARS 29)

H-INDEX

13
(FIVE YEARS 2)

Published By Academy And Industry Research Collaboration Center

2230-9608, 2231-007x

Author(s):  
Jaishree Ranganathan

Cancer is an extremely heterogenous disease. Leukemia is a cancer of the white blood cells and some other cell types. Diagnosing leukemia is laborious in a multitude of areas including heamatology. Machine Learning (ML) is the branch of Artificial Intelligence. There is an emerging trend in ML models for data classification. This review aimed to describe the literature of ML in the classification of datasets for acute leukemia. In addition to describing the existing literature, this work aims to identify different sources of publicly available data that could be utilised for research and development of intelligent machine learning applications for classification. To best of the knowledge there is no such work that contributes such information to the research community.


Author(s):  
Jaishree Ranganathan

Cancer is an extremely heterogenous disease. Leukemia is a cancer of the white blood cells and some other cell types. Diagnosing leukemia is laborious in a multitude of areas including heamatology. Machine Learning (ML) is the branch of Artificial Intelligence. There is an emerging trend in ML models for data classification. This review aimed to describe the literature of ML in the classification of datasets for acute leukemia. In addition to describing the existing literature, this work aims to identify different sources of publicly available data that could be utilised for research and development of intelligent machine learning applications for classification. To best of the knowledge there is no such work that contributes such information to the research community.


Author(s):  
Mohammed Hamoumi ◽  
Abdellah Haddout ◽  
Mariam Benhadou

Based on the principle that perfection is a divine criterion, process management exists on the one hand to achieve excellence (near perfection) and on the other hand to avoid imperfection. In other words, Operational Excellence (EO) is one of the approaches, when used rigorously, aims to maximize performance. Therefore, the mastery of problem solving remains necessary to achieve such performance level. There are many tools that we can use whether in continuous improvement for the resolution of chronic problems (KAIZEN, DMAIC, Lean six sigma…) or in resolution of sporadic defects (8D, PDCA, QRQC ...). However, these methodologies often use the same basic tools (Ishikawa diagram, 5 why, tree of causes…) to identify potential causes and root causes. This result in three levels of causes: occurrence, no detection and system. The research presents the development of DINNA diagram [1] as an effective and efficient process that links the Ishikawa diagram and the 5 why method to identify the root causes and avoid recurrence. The ultimate objective is to achieve the same result if two working groups with similar skills analyse the same problem separately, to achieve this, the consistent application of a robust methodology is required. Therefore, we are talking about 5 dimensions; occurrence, non-detection, system, effectiveness and efficiency. As such, the paper offers a solution that is both effective and efficient to help practitioners of industrial problem solving avoid missing the real root cause and save costs following a wrong decision.


Author(s):  
Md Kamrul Islam ◽  
Sabeur Aridhi ◽  
Malika Smail-Tabbone

The task of inferring missing links or predicting future ones in a graph based on its current structure is referred to as link prediction. Link prediction methods that are based on pairwise node similarity are well-established approaches in the literature and show good prediction performance in many realworld graphs though they are heuristic. On the other hand, graph embedding approaches learn lowdimensional representation of nodes in graph and are capable of capturing inherent graph features, and thus support the subsequent link prediction task in graph. This paper studies a selection of methods from both categories on several benchmark (homogeneous) graphs with different properties from various domains. Beyond the intra and inter category comparison of the performances of the methods, our aim is also to uncover interesting connections between Graph Neural Network(GNN)- based methods and heuristic ones as a means to alleviate the black-box well-known limitation.


Author(s):  
Md Kamrul Islam ◽  
Sabeur Aridhi ◽  
Malika Smail-Tabbone

The task of inferring missing links or predicting future ones in a graph based on its current structure is referred to as link prediction. Link prediction methods that are based on pairwise node similarity are well-established approaches in the literature and show good prediction performance in many realworld graphs though they are heuristic. On the other hand, graph embedding approaches learn lowdimensional representation of nodes in graph and are capable of capturing inherent graph features, and thus support the subsequent link prediction task in graph. This paper studies a selection of methods from both categories on several benchmark (homogeneous) graphs with different properties from various domains. Beyond the intra and inter category comparison of the performances of the methods, our aim is also to uncover interesting connections between Graph Neural Network(GNN)- based methods and heuristic ones as a means to alleviate the black-box well-known limitation.


Author(s):  
Mohammed Hamoumi ◽  
Abdellah Haddout ◽  
Mariam Benhadou

Based on the principle that perfection is a divine criterion, process management exists on the one hand to achieve excellence (near perfection) and on the other hand to avoid imperfection. In other words, Operational Excellence (EO) is one of the approaches, when used rigorously, aims to maximize performance. Therefore, the mastery of problem solving remains necessary to achieve such performance level. There are many tools that we can use whether in continuous improvement for the resolution of chronic problems (KAIZEN, DMAIC, Lean six sigma…) or in resolution of sporadic defects (8D, PDCA, QRQC ...). However, these methodologies often use the same basic tools (Ishikawa diagram, 5 why, tree of causes…) to identify potential causes and root causes. This result in three levels of causes: occurrence, no detection and system. The research presents the development of DINNA diagram [1] as an effective and efficient process that links the Ishikawa diagram and the 5 why method to identify the root causes and avoid recurrence. The ultimate objective is to achieve the same result if two working groups with similar skills analyse the same problem separately, to achieve this, the consistent application of a robust methodology is required. Therefore, we are talking about 5 dimensions; occurrence, non-detection, system, effectiveness and efficiency. As such, the paper offers a solution that is both effective and efficient to help practitioners of industrial problem solving avoid missing the real root cause and save costs following a wrong decision.


Author(s):  
Hu Shaolin ◽  
Zhang Qinghua ◽  
Su Naiquan ◽  
Li Xiwu

In recent years, the big data has attracted more and more attention. It can bring us more information and broader perspective to analyse and deal with problems than the conventional situation. However, so far, there is no widely acceptable and measurable definition for the term “big data”. For example, what significant features a data set needs to have can be called big data, and how large a data set is can be called big data, and so on. Although the "5V" description widely used in textbooks has been tried to solve the above problems in many big data literatures, "5V" still has significant shortcomings and limitations, and is not suitable for completely describing big data problems in practical fields such as industrial production. Therefore, this paper creatively puts forward the new concept of data cloud and the data cloud-based "3M" descriptive definition of big data, which refers to a wide range of data sources (Multisource), ultra-high dimensions (Multi-dimensional) and a long enough time span (Multi-spatiotemporal). Based on the 3M description of big data, this paper sets up four typical application paradigms for the production big data, analyses the typical application of four paradigms of big data, and lays the foundation for applications of big data from petrochemical industry.


Author(s):  
Alex Romanova

Big Data creates many challenges for data mining experts, in particular in getting meanings of text data. It is beneficial for text mining to build a bridge between word embedding process and graph capacity to connect the dots and represent complex correlations between entities. In this study we examine processes of building a semantic graph model to determine word associations and discover document topics. We introduce a novel Word2Vec2Graph model that is built on top of Word2Vec word embedding model. We demonstrate how this model can be used to analyze long documents, get unexpected word associations and uncover document topics. To validate topic discovery method we transfer words to vectors and vectors to images and use CNN deep learning image classification.


Author(s):  
Aude Maignan ◽  
Tony Scott

Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a σ value, a hyper-parameter which can be manually defined and manipulated to suit the application. Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an outstanding task because normally such expressions are impossible to solve analytically. However, we prove that if the points are all included in a square region of size σ, there is only one minimum. This bound is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new numerical approach “per block”. This technique decreases the number of particles by approximating some groups of particles to weighted particles. These findings are not only useful to the quantum clustering problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics and other applications.


Author(s):  
Paul Morrison ◽  
Maxwell Dixon ◽  
Arsham Sheybani ◽  
Bahareh Rahmani

The purpose of this retrospective study is to measure machine learning models' ability to predict glaucoma drainage device failure based on demographic information and preoperative measurements. The medical records of 165 patients were used. Potential predictors included the patients' race, age, sex, preoperative intraocular pressure (IOP), preoperative visual acuity, number of IOP-lowering medications, and number and type of previous ophthalmic surgeries. Failure was defined as final IOP greater than 18 mm Hg, reduction in intraocular pressure less than 20% from baseline, or need for reoperation unrelated to normal implant maintenance. Five classifiers were compared: logistic regression, artificial neural network, random forest, decision tree, and support vector machine. Recursive feature elimination was used to shrink the number of predictors and grid search was used to choose hyperparameters. To prevent leakage, nested cross-validation was used throughout. With a small amount of data, the best classfier was logistic regression, but with more data, the best classifier was the random forest.


Sign in / Sign up

Export Citation Format

Share Document