Productivity estimation of cutter suction dredger operation through data mining and learning from real-time big data

PurposeThis paper presents a new approach of productivity estimation of cutter suction dredger operation through data mining and learning from real-time big data.Design/methodology/approachThe paper used big data, data mining and machine learning techniques to extract features of cutter suction dredgers (CSD) for predicting its productivity. ElasticNet-SVR (Elastic Net-Support Vector Machine) method is used to filter the original monitoring data. Along with the actual working conditions of CSD, 15 features were selected. Then, a box plot was used to clean the corresponding data by filtering out outliers. Finally, four algorithms, namely SVR (Support Vector Regression), XGBoost (Extreme Gradient Boosting), LSTM (Long-Short Term Memory Network) and BP (Back Propagation) Neural Network, were used for modeling and testing.FindingsThe paper provided a comprehensive forecasting framework for productivity estimation including feature selection, data processing and model evaluation. The optimal coefficient of determination (R2) of four algorithms were all above 80.0%, indicating that the features selected were representative. Finally, the BP neural network model coupled with the SVR model was selected as the final model.Originality/valueMachine-learning algorithm incorporating domain expert judgments was used to select predictive features. The final optimal coefficient of determination (R2) of the coupled model of BP neural network and SVR is 87.6%, indicating that the method proposed in this paper is effective for CSD productivity estimation.

Download Full-text

Comparison of machine learning algorithms for concentration detection and prediction of formaldehyde based on electronic nose

Sensor Review ◽

10.1108/sr-07-2015-0104 ◽

2016 ◽

Vol 36 (2) ◽

pp. 207-216 ◽

Cited By ~ 14

Author(s):

Liyuan Xu ◽

Jie He ◽

Shihong Duan ◽

Xibin Wu ◽

Qin Wang

Keyword(s):

Neural Network ◽

Machine Learning ◽

Electronic Nose ◽

Bp Neural Network ◽

Sensor Arrays ◽

Machine Learning Algorithms ◽

Support Vector ◽

Formaldehyde Concentration ◽

Content Type ◽

Depth Analysis

Purpose Sensor arrays and pattern recognition-based electronic nose (E-nose) is a typical detection and recognition instrument for indoor air quality (IAQ). The E-nose is able to monitor several pollutants in the air by mimicking the human olfactory system. Formaldehyde concentration prediction is one of the major functionalities of the E-nose, and three typical machine learning (ML) algorithms are most frequently used, including back propagation (BP) neural network, radial basis function (RBF) neural network and support vector regression (SVR). Design/methodology/approach This paper comparatively evaluates and analyzes those three ML algorithms under controllable environment, which is built on a marketable sensor arrays E-nose platform. Variable temperature (T), relative humidity (RH) and pollutant concentrations (C) conditions were measured during experiments to support the investigation. Findings Regression models have been built using the above-mentioned three typical algorithms, and in-depth analysis demonstrates that the model of the BP neural network results in a better prediction performance than others. Originality/value Finally, the empirical results prove that ML algorithms, combined with low-cost sensors, can make high-precision contaminant concentration detection indoor.

Download Full-text

A systematic literature review of data science, data analytics and machine learning applied to healthcare engineering systems

Management Decision ◽

10.1108/md-01-2020-0035 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Roberto Salazar-Reyna ◽

Fernando Gonzalez-Aleu ◽

Edgar M.A. Granda-Gutierrez ◽

Jenny Diaz-Ramirez ◽

Jose Arturo Garza-Reyes ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Literature Review ◽

Systematic Literature Review ◽

Data Analytics ◽

Research Area ◽

Engineering Systems ◽

Content Type ◽

Healthcare Engineering

PurposeThe objective of this paper is to assess and synthesize the published literature related to the application of data analytics, big data, data mining and machine learning to healthcare engineering systems.Design/methodology/approachA systematic literature review (SLR) was conducted to obtain the most relevant papers related to the research study from three different platforms: EBSCOhost, ProQuest and Scopus. The literature was assessed and synthesized, conducting analysis associated with the publications, authors and content.FindingsFrom the SLR, 576 publications were identified and analyzed. The research area seems to show the characteristics of a growing field with new research areas evolving and applications being explored. In addition, the main authors and collaboration groups publishing in this research area were identified throughout a social network analysis. This could lead new and current authors to identify researchers with common interests on the field.Research limitations/implicationsThe use of the SLR methodology does not guarantee that all relevant publications related to the research are covered and analyzed. However, the authors' previous knowledge and the nature of the publications were used to select different platforms.Originality/valueTo the best of the authors' knowledge, this paper represents the most comprehensive literature-based study on the fields of data analytics, big data, data mining and machine learning applied to healthcare engineering systems.

Download Full-text

Improvement of Support Vector Machine Algorithm in Big Data Background

Mathematical Problems in Engineering ◽

10.1155/2021/5594899 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Babacar Gaye ◽

Dezheng Zhang ◽

Aziguli Wulamu

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Time Complexity ◽

Dual Problem ◽

Learning Algorithm ◽

Rapid Development ◽

Machine Learning Algorithms ◽

Support Vector ◽

Original Space

With the rapid development of the Internet and the rapid development of big data analysis technology, data mining has played a positive role in promoting industry and academia. Classification is an important problem in data mining. This paper explores the background and theory of support vector machines (SVM) in data mining classification algorithms and analyzes and summarizes the research status of various improved methods of SVM. According to the scale and characteristics of the data, different solution spaces are selected, and the solution of the dual problem is transformed into the classification surface of the original space to improve the algorithm speed. Research Process. Incorporating fuzzy membership into multicore learning, it is found that the time complexity of the original problem is determined by the dimension, and the time complexity of the dual problem is determined by the quantity, and the dimension and quantity constitute the scale of the data, so it can be based on the scale of the data Features Choose different solution spaces. The algorithm speed can be improved by transforming the solution of the dual problem into the classification surface of the original space. Conclusion. By improving the calculation rate of traditional machine learning algorithms, it is concluded that the accuracy of the fitting prediction between the predicted data and the actual value is as high as 98%, which can make the traditional machine learning algorithm meet the requirements of the big data era. It can be widely used in the context of big data.

Download Full-text

Crime Data Forecasting Using Machine Learning and Big Data Analytics

Webology ◽

10.14704/web/v18si04/web18284 ◽

2021 ◽

Vol 18 (Special Issue 04) ◽

pp. 591-606

Author(s):

R. Brindha ◽

Dr.M. Thillaikarasi

Keyword(s):

Neural Network ◽

Machine Learning ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Machine Learning Algorithms ◽

Geographical Information ◽

Recursive Feature Elimination ◽

Support Vector ◽

Crime Data

Big data analytics (BDA) is a system based method with an aim to recognize and examine different designs, patterns and trends under the big dataset. In this paper, BDA is used to visualize and trends the prediction where exploratory data analysis examines the crime data. “A successive facts and patterns have been taken in following cities of California, Washington and Florida by using statistical analysis and visualization”. The predictive result gives the performance using Keras Prophet Model, LSTM and neural network models followed by prophet model which are the existing methods used to find the crime data under BDA technique. But the crime actions increases day by day which is greater task for the people to overcome the challenging crime activities. Some ignored the essential rate of influential aspects. To overcome these challenging problems of big data, many studies have been developed with limited one or two features. “This paper introduces a big data introduces to analyze the influential aspects about the crime incidents, and examine it on New York City. The proposed structure relates the dynamic machine learning algorithms and geographical information system (GIS) to consider the contiguous reasons of crime data. Recursive feature elimination (RFE) is used to select the optimum characteristic data. Exploitation of gradient boost decision tree (GBDT), logistic regression (LR), support vector machine (SVM) and artificial neural network (ANN) are related to develop the optimum data model. Significant impact features were then reviewed by applying GBDT and GIS”. The experimental results illustrates that GBDT along with GIS model combination can identify the crime ranking with high performance and accuracy compared to existing method.”

Download Full-text

Research on Closed-Loop Safety Production System of Hot Oil Pipeline Based on Big Data Mining

ASME 2019 Asia Pacific Pipeline Conference ◽

10.1115/appc2019-7609 ◽

2019 ◽

Author(s):

Yu Tao ◽

Li Chuanxian ◽

Liu Lijun ◽

Chen Hongjun ◽

Guo Peng ◽

...

Keyword(s):

Neural Network ◽

Data Mining ◽

Big Data ◽

Prediction Model ◽

Bp Neural Network ◽

Closed Loop ◽

Long Distance ◽

Oil Pipeline ◽

Big Data Mining ◽

Actual Production

Abstract The process of long-distance hot oil pipeline is complicated, and its safety and optimization are contradictory. In actual production and operation, the theoretical calculation model of oil temperature along the pipeline has some problems, such as large error and complex application. This research relies on actual production data and uses big data mining algorithms such as BP neural network, ARMA, seq2seq to establish oil temperature prediction model. The prediction result is less than 0.5 C, which solves the problem of accurate prediction of dynamic oil temperature during pipeline operation. Combined with pigging, the friction prediction model of standard pipeline section is established by BP neural network, and then the economic pigging period of 80 days is given; and after the friction database is established, the historical friction data are analyzed by using the Gauss formula, and 95% of the friction is set as the threshold data to effectively monitor the variation of the friction due to the long period of waxing in pipelines. The closed loop operation system of hot oil pipeline safety and optimization was formed to guide the daily process adjustment and production arrangement of pipeline with energy saving up to 92.4%. The prediction model and research results based on production big data have good adaptability and generalization, which lays a foundation for future intelligent control of pipelines.

Download Full-text

Study on the application of big data techniques for the third-party logistics using novel support vector machine algorithm

Journal of Enterprise Information Management ◽

10.1108/jeim-02-2021-0076 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Feifei Sun ◽

Guohong Shi

Keyword(s):

Data Mining ◽

Big Data ◽

Gradient Descent ◽

Third Party ◽

Support Vector ◽

Third Party Logistics ◽

Forgetting Factor ◽

Content Type ◽

Big Data Mining ◽

Self Learning

PurposeThis paper aims to effectively explore the application effect of big data techniques based on an α-support vector machine-stochastic gradient descent (SVMSGD) algorithm in third-party logistics, obtain the valuable information hidden in the logistics big data and promote the logistics enterprises to make more reasonable planning schemes.Design/methodology/approachIn this paper, the forgetting factor is introduced without changing the algorithm's complexity and proposed an algorithm based on the forgetting factor called the α-SVMSGD algorithm. The algorithm selectively deletes or retains the historical data, which improves the adaptability of the classifier to the real-time new logistics data. The simulation results verify the application effect of the algorithm.FindingsWith the increase of training times, the test error percentages of gradient descent (GD) algorithm, gradient descent support (SGD) algorithm and the α-SVMSGD algorithm decrease gradually; in the process of logistics big data processing, the α-SVMSGD algorithm has the efficiency of SGD algorithm while ensuring that the GD direction approaches the optimal solution direction and can use a small amount of data to obtain more accurate results and enhance the convergence accuracy.Research limitations/implicationsThe threshold setting of the forgetting factor still needs to be improved. Setting thresholds for different data types in self-learning has become a research direction. The number of forgotten data can be effectively controlled through big data processing technology to improve data support for the normal operation of third-party logistics.Practical implicationsIt can effectively reduce the time-consuming of data mining, realize the rapid and accurate convergence of sample data without increasing the complexity of samples, improve the efficiency of logistics big data mining, reduce the redundancy of historical data, and has a certain reference value in promoting the development of logistics industry.Originality/valueThe classification algorithm proposed in this paper has feasibility and high convergence in third-party logistics big data mining. The α-SVMSGD algorithm proposed in this paper has a certain application value in real-time logistics data mining, but the design of the forgetting factor threshold needs to be improved. In the future, the authors will continue to study how to set different data type thresholds in self-learning.

Download Full-text

Application of BP Neural Network Based on Petrophysical Big Data Mining

Journal of Interconnection Networks ◽

10.1142/s0219265921430179 ◽

2021 ◽

Author(s):

Ding Yu ◽

Yuan Shixiong ◽

Deng Rui ◽

Luo Chenxiang

Keyword(s):

Neural Network ◽

Data Mining ◽

Big Data ◽

Bp Neural Network ◽

Reference Value ◽

Cloud Computing Environment ◽

Big Data Mining ◽

Geological Conditions ◽

Network Prediction ◽

Distributed Cloud

Based on the big data mining method of petrophysical data, this paper studies the method and application of BP neural network to establish nonlinear interpretation model in distributed cloud computing environment. The nonlinear mapping relationship between the relative objective logging response and actual formation component is established by extracting the data mining result model, which overcomes existing deficiencies of the conventional logging interpretation procedure based on the homogeneity theory, linear hypothesis and the use of statistical experience simplifying model and parameters. The results show that network prediction model has been improved and has superior reference value for solving practical problems of interpretation under complex geological conditions.

Download Full-text

Intellectual Data Mining in Socio-Geographic Research

Общественные науки и современность ◽

10.31857/s086904990017878-7 ◽

2021 ◽

pp. 150

Author(s):

Viktor Blanutsa

Keyword(s):

Neural Network ◽

Machine Learning ◽

Data Mining ◽

Social Geography ◽

Geospatial Data ◽

Semantic Search ◽

Bibliographic Databases ◽

Support Vector ◽

World Science ◽

Territorial Organization

In social geography, aimed at understanding the territorial organization of society, various methods are used, including data mining. However, there is no generalization of the experience of using such methods in world science. Therefore, the purpose of this article is to analyze the global array of scientific articles on this issue to identify priorities, algorithms and thematic areas with their capabilities and limitations. Using the author's method of semantic search based on machine learning, about two hundred articles published in the last two decades have been identified in eight bibliographic databases. Their generalization made it possible to identify chronological and chorological priorities, as well as to establish that a limited number of algorithms had been used for the geospatial data mining, which can be combined into groups of neural network, evolutionary, decision trees, swarm intelligence and support vector methods. These algorithms were used in five thematic areas (spatial-urban, regional-typological, area-based, geo-indicative and territorial-connective). The main features and limitations in each direction are given.

Download Full-text

Survey of Machine Learning Algorithms to Detect Malware in Consumer Internet of Things Devices

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213021500202 ◽

2021 ◽

Vol 30 (04) ◽

pp. 2150020

Author(s):

Luke Holbrook ◽

Miltiadis Alamaniotis

Keyword(s):

Neural Network ◽

Machine Learning ◽

Internet Of Things ◽

Deep Neural Network ◽

Learning Algorithms ◽

Cyber Attacks ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Support Vector ◽

Iot Devices

With the increase of cyber-attacks on millions of Internet of Things (IoT) devices, the poor network security measures on those devices are the main source of the problem. This article aims to study a number of these machine learning algorithms available for their effectiveness in detecting malware in consumer internet of things devices. In particular, the Support Vector Machines (SVM), Random Forest, and Deep Neural Network (DNN) algorithms are utilized for a benchmark with a set of test data and compared as tools in safeguarding the deployment for IoT security. Test results on a set of 4 IoT devices exhibited that all three tested algorithms presented here detect the network anomalies with high accuracy. However, the deep neural network provides the highest coefficient of determination R2, and hence, it is identified as the most precise among the tested algorithms concerning the security of IoT devices based on the data sets we have undertaken.

Download Full-text

Extracting Knowledge from Big Data for Sustainability: A Comparison of Machine Learning Techniques

Sustainability ◽

10.3390/su11236669 ◽

2019 ◽

Vol 11 (23) ◽

pp. 6669 ◽

Cited By ~ 4

Author(s):

Raghu Garg ◽

Himanshu Aggarwal ◽

Piera Centobelli ◽

Roberto Cerchione

Keyword(s):

Machine Learning ◽

Big Data ◽

Soil Quality ◽

Soil Analysis ◽

Regression Tree ◽

Machine Learning Techniques ◽

Stochastic Gradient Descent ◽

Coefficient Of Determination ◽

Support Vector ◽

Operating Characteristics

At present, due to the unavailability of natural resources, society should take the maximum advantage of data, information, and knowledge to achieve sustainability goals. In today’s world condition, the existence of humans is not possible without the essential proliferation of plants. In the photosynthesis procedure, plants use solar energy to convert into chemical energy. This process is responsible for all life on earth, and the main controlling factor for proper plant growth is soil since it holds water, air, and all essential nutrients of plant nourishment. Though, due to overexposure, soil gets despoiled, so fertilizer is an essential component to hold the soil quality. In that regard, soil analysis is a suitable method to determine soil quality. Soil analysis examines the soil in laboratories and generates reports of unorganized and insignificant data. In this study, different big data analysis machine learning methods are used to extracting knowledge from data to find out fertilizer recommendation classes on behalf of present soil nutrition composition. For this experiment, soil analysis reports are collected from the Tata soil and water testing center. In this paper, Mahoot library is used for analysis of stochastic gradient descent (SGD), artificial neural network (ANN) performance on Hadoop environment. For better performance evaluation, we also used single machine experiments for random forest (RF), K-nearest neighbors K-NN, regression tree (RT), support vector machine (SVM) using polynomial function, SVM using radial basis function (RBF) methods. Detailed experimental analysis was carried out using overall accuracy, AUC–ROC (receiver operating characteristics (ROC), and area under the ROC curve (AUC)) curve, mean absolute prediction error (MAE), root mean square error (RMSE), and coefficient of determination (R2) validation measurements on soil reports dataset. The results provide a comparison of solution classes and conclude that the SGD outperforms other approaches. Finally, the proposed results support to select the solution or recommend a class which suggests suitable fertilizer to crops for maximum production.

Download Full-text