Comparison of data mining algorithms for pressure prediction of crude oil pipeline to identify congeal

Data mining is applied in many areas. In oil and gas industries, data mining may be implemented to support the decision making in their operation to prevent a massive loss. One of serious problems in the petroleum industry is congeal phenomenon, since it leads to block crude oil flow during transport in a pipeline system. In the crude oil pipeline system, pressure online monitoring in the pipeline is usually implemented to control the congeal phenomenon. However, this system is not able to predict the pipeline pressure on the next several days. This research is purposed to compare the pressure prediction of the crude oil pipeline using data mining algorithms based on the real historical data from the petroleum field. To find the best algorithms, it was compared 4 data mining algorithms, i.e. Random Forest, Multilayer Perceptron (MLP), Decision Tree, and Linear Regression. As a result, the Linear Regression shows the best performance among the 4 algorithms with R2 = 0.55 and RMSE = 28.34. This research confirmed that data mining algorithm is a good method to be implemented in petroleum industry to predict the pressure of the crude oil pipeline, even the accuracy of the prediction values should be improved. To have better accuracy, it is necessary to collect more data and find better performance of the data mining algorithm

Download Full-text

Comprehensibility of Data Mining Algorithms

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch037 ◽

2011 ◽

pp. 190-195 ◽

Cited By ~ 7

Author(s):

Zhi-Hua Zhou

Keyword(s):

Data Mining ◽

Decision Making ◽

Decision Makers ◽

Data Mining Algorithm ◽

Human Beings ◽

Common People ◽

Data Mining Algorithms ◽

Mining Algorithm ◽

Aid Decision ◽

Mining Algorithms

Data mining attempts to identify valid, novel, potentially useful, and ultimately understandable patterns from huge volume of data. The mined patterns must be ultimately understandable because the purpose of data mining is to aid decision-making. If the decision-makers cannot understand what does a mined pattern mean, then the pattern cannot be used well. Since most decision-makers are not data mining experts, ideally, the patterns should be in a style comprehensible to common people. So, comprehensibility of data mining algorithms, that is, the ability of a data mining algorithm to produce patterns understandable to human beings, is an important factor.

Download Full-text

TRADE-OFF BETWEEN COMPUTATION TIME AND NUMBER OF RULES FOR FUZZY MINING FROM QUANTITATIVE DATA

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488501001071 ◽

2001 ◽

Vol 09 (05) ◽

pp. 587-604 ◽

Cited By ~ 122

Author(s):

TZUNG-PEI HONG ◽

CHAN-SHENG KUO ◽

SHENG-CHAI CHI

Keyword(s):

Data Mining ◽

Computation Time ◽

Data Mining Algorithm ◽

Trade Off ◽

Fuzzy Association Rules ◽

Data Mining Algorithms ◽

Mining Algorithm ◽

Linguistic Term ◽

Complete Set ◽

Mining Algorithms

Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values. Transactions with quantitative values are however commonly seen in real-world applications. We proposed a fuzzy mining algorithm by which each attribute used only the linguistic term with the maximum cardinality int he mining process. The number of items was thus the same as that of the original attributes, making the processing time reduced. The fuzzy association rules derived in this way are not complete. This paper thus modifies it and proposes a new fuzzy data-mining algorithm for extrating interesting knowledge from transactions stored as quantitative values. The proposed algorithm can derive a more complete set of rules but with more computation time than the method proposed. Trade-off thus exists between the computation time and the completeness of rules. Choosing an appropriate learning method thus depends on the requirement of the application domains.

Download Full-text

DATA MINING ALGORITHM BASED ON CLOUD COMPUTING

Latin American Applied Research - An international journal ◽

10.52292/j.laar.2018.241 ◽

2018 ◽

Vol 48 (4) ◽

pp. 281-285

Author(s):

Y. J. HAO

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Cellular Neural Network ◽

Data Mining Algorithm ◽

Computing Technology ◽

Data Mining Algorithms ◽

Mining Algorithm ◽

Hash Algorithm ◽

Mining Algorithms ◽

Big Data Technology

The data mining algorithm based on cloud computing is studied and analyzed in this paper. Firstly, the research status and background of the data mining algorithms based on cloud computing are introduced briefly. Secondly, the design of Hash algorithm under cellular neural network is introduced which is needed in this paper. Next, the design of wavelet data compression algorithm for wireless sensor networks is described. Finally, the experimental results and the optimization similarity analysis are obtained. The analysis results show that the data mining algorithm based on cloud computing constructed in this paper plays an important role in data mining, and can improve the data mining algorithm of cloud computing and the development level of cloud computing technology and big data technology to some extent.

Download Full-text

Comparing Performance of Data Mining Algorithms in Prediction Heart Diseases

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v5i6.pp1569-1576 ◽

2015 ◽

Vol 5 (6) ◽

pp. 1569 ◽

Cited By ~ 13

Author(s):

Moloud Abdar ◽

Sharareh R. Niakan Kalhori ◽

Tole Sutikno ◽

Imam Much Ibnu Subroto ◽

Goli Arji

Keyword(s):

Neural Network ◽

Data Mining ◽

Decision Tree ◽

Heart Diseases ◽

Support Vector ◽

Data Mining Algorithm ◽

Network Support ◽

Data Mining Algorithms ◽

Mining Algorithms ◽

Analysis Models

Heart diseases are among the nation’s leading couse of mortality and moribidity. Data mining teqniques can predict the likelihood of patients getting a heart disease. The purpose of this study is comparison of different data mining algorithm on prediction of heart diseases. This work applied and compared data mining techniques to predict the risk of heart diseases. After feature analysis, models by five algorithms including decision tree (C5.0), neural network, support vector machine (SVM), logistic regression and k-nearest neighborhood (KNN) were developed and validated. C5.0 Decision tree has been able to build a model with greatest accuracy 93.02%, KNN, SVM, Neural network have been 88.37%, 86.05% and 80.23% respectively. Produced results of decision tree can be simply interpretable and applicable; their rules can be understood easily by different clinical practitioner.

Download Full-text

Predicting COVID-19 Incidence Through Analysis of Google Trends Data in Iran: Data Mining and Deep Learning Pilot Study (Preprint)

10.2196/preprints.18828 ◽

2020 ◽

Cited By ~ 5

Author(s):

Seyed Mohammad Ayyoubzadeh ◽

Seyed Mehdi Ayyoubzadeh ◽

Hoda Zahedi ◽

Mahnaz Ahmadi ◽

Sharareh R Niakan Kalhori

Keyword(s):

Data Mining ◽

Health Care ◽

Linear Regression ◽

Short Term Memory ◽

Google Trends ◽

Health Care Managers ◽

Health Crisis ◽

Data Mining Algorithms ◽

Performance Metric ◽

Mining Algorithms

BACKGROUND The recent global outbreak of coronavirus disease (COVID-19) is affecting many countries worldwide. Iran is one of the top 10 most affected countries. Search engines provide useful data from populations, and these data might be useful to analyze epidemics. Utilizing data mining methods on electronic resources’ data might provide a better insight into the COVID-19 outbreak to manage the health crisis in each country and worldwide. OBJECTIVE This study aimed to predict the incidence of COVID-19 in Iran. METHODS Data were obtained from the Google Trends website. Linear regression and long short-term memory (LSTM) models were used to estimate the number of positive COVID-19 cases. All models were evaluated using 10-fold cross-validation, and root mean square error (RMSE) was used as the performance metric. RESULTS The linear regression model predicted the incidence with an RMSE of 7.562 (SD 6.492). The most effective factors besides previous day incidence included the search frequency of handwashing, hand sanitizer, and antiseptic topics. The RMSE of the LSTM model was 27.187 (SD 20.705). CONCLUSIONS Data mining algorithms can be employed to predict trends of outbreaks. This prediction might support policymakers and health care managers to plan and allocate health care resources accordingly.

Download Full-text

Migrating From Data Mining to Big Data Mining

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.4.14667 ◽

2018 ◽

Vol 7 (3.4) ◽

pp. 13

Author(s):

Gourav Bathla ◽

Himanshu Aggarwal ◽

Rinkle Rani

Keyword(s):

Data Mining ◽

Big Data ◽

Response Time ◽

Large Scale ◽

Naive Bayes ◽

Naïve Bayes ◽

Data Mining Algorithm ◽

Big Data Mining ◽

Data Mining Algorithms ◽

Mining Algorithms

Data mining is one of the most researched fields in computer science. Several researches have been carried out to extract and analyse important information from raw data. Traditional data mining algorithms like classification, clustering and statistical analysis can process small scale of data with great efficiency and accuracy. Social networking interactions, business transactions and other communications result in Big data. It is large scale of data which is not in competency for traditional data mining techniques. It is observed that traditional data mining algorithms are not capable for storage and processing of large scale of data. If some algorithms are capable, then response time is very high. Big data have hidden information, if that is analysed in intelligent manner can be highly beneficial for business organizations. In this paper, we have analysed the advancement from traditional data mining algorithms to Big data mining algorithms. Applications of traditional data mining algorithms can be straight forward incorporated in Big data mining algorithm. Several studies have analysed traditional data mining with Big data mining, but very few have analysed most important algortihsm within one research work, which is the core motive of our paper. Readers can easily observe the difference between these algorthithms with pros and cons. Mathemtics concepts are applied in data mining algorithms. Means and Euclidean distance calculation in Kmeans, Vectors application and margin in SVM and Bayes therorem, conditional probability in Naïve Bayes algorithm are real examples. Classification and clustering are the most important applications of data mining. In this paper, Kmeans, SVM and Naïve Bayes algorithms are analysed in detail to observe the accuracy and response time both on concept and empirical perspective. Hadoop, Mapreduce etc. Big data technologies are used for implementing Big data mining algorithms. Performace evaluation metrics like speedup, scaleup and response time are used to compare traditional mining with Big data mining.

Download Full-text

Data Mining Algorithm and its Application with Massive Text Database

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.742.395 ◽

2015 ◽

Vol 742 ◽

pp. 395-398

Author(s):

Chun Ping Wang

Keyword(s):

Data Mining ◽

Semantic Analysis ◽

Data Mining Algorithm ◽

User Characteristics ◽

Text Data ◽

Data Mining Algorithms ◽

Mining Methods ◽

Language Characteristics ◽

Network Language ◽

Mining Algorithms

Features of large text data mining methods method is avoided semantic analysis from the lexical, syntactic, but by means of statistical analysis and processing large text data, thus maximizing literally ignored similar semantic differences, adapt to network language characteristics. The results of our paper show that data mining algorithms may extract the information in this article can portray the characteristics of vocabulary specific user characteristics and make recommendations based on the characteristics of the user vocabulary.

Download Full-text

Design and Implementation of Data Mining Based on Distributed Computing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.1702 ◽

2014 ◽

Vol 644-650 ◽

pp. 1702-1705 ◽

Cited By ~ 2

Author(s):

Jin Hai Zhang

Keyword(s):

Data Mining ◽

Distributed Computing ◽

Data Storage ◽

Leading Edge ◽

Data Mining Algorithm ◽

Dynamic Features ◽

Computing Platform ◽

Data Mining Algorithms ◽

Rich Data ◽

Mining Algorithms

Beacause internet data has a massive, diverse, heterogeneous, dynamic features, using traditional databases to analyze these data, data storage and processing efficiencies already can not meet the requirements. Utilizing leading-edge distributed computing technology to solve traditional data mining scenarios in lack of data mining of massive data improved data mining algorithm of lot OK Hadoop distributed computing platform, which later on other data mining algorithms using Hadoop to reference while using rich data mining algorithms can be found there is more value in your data.

Download Full-text

Data Mining Algorithm for Physical Health Monitoring of Young Students Based on Big Data

Journal of Healthcare Engineering ◽

10.1155/2021/9962906 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Jiangang Sun ◽

Xiaoran Jiang ◽

Guoliang Yuan ◽

Zhenhuai Chen

Keyword(s):

Data Mining ◽

Big Data ◽

Physical Health ◽

Data Management System ◽

Practical Significance ◽

Data Mining Algorithm ◽

Physical Test ◽

Healthy Development ◽

Data Mining Algorithms ◽

Mining Algorithm

With the continuous improvement of living standards, the level of physical development of adolescents has improved significantly. The physical functions and healthy development of adolescents are relatively slow and even appear to decline. This paper proposes a novel data mining algorithm based on big data for monitoring of adolescent student’s physical health to overcome this problem and enhance young people’s physical fitness and mental health. Since big data technology has positive practical significance in promoting young people’s healthy development and promoting individual health rights, this article will implement commonly used data mining algorithms and Hadoop/Spark big data processing. The algorithm on different platforms verified that the big data platform has good computing performance for the data mining algorithm by comparing the running time. The current work will prove to be a complete physical health data management system and effectively save, process, and analyze adolescents’ physical test data.

Download Full-text