Study on the application of big data techniques for the third-party logistics using novel support vector machine algorithm

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Feifei Sun ◽  
Guohong Shi

PurposeThis paper aims to effectively explore the application effect of big data techniques based on an α-support vector machine-stochastic gradient descent (SVMSGD) algorithm in third-party logistics, obtain the valuable information hidden in the logistics big data and promote the logistics enterprises to make more reasonable planning schemes.Design/methodology/approachIn this paper, the forgetting factor is introduced without changing the algorithm's complexity and proposed an algorithm based on the forgetting factor called the α-SVMSGD algorithm. The algorithm selectively deletes or retains the historical data, which improves the adaptability of the classifier to the real-time new logistics data. The simulation results verify the application effect of the algorithm.FindingsWith the increase of training times, the test error percentages of gradient descent (GD) algorithm, gradient descent support (SGD) algorithm and the α-SVMSGD algorithm decrease gradually; in the process of logistics big data processing, the α-SVMSGD algorithm has the efficiency of SGD algorithm while ensuring that the GD direction approaches the optimal solution direction and can use a small amount of data to obtain more accurate results and enhance the convergence accuracy.Research limitations/implicationsThe threshold setting of the forgetting factor still needs to be improved. Setting thresholds for different data types in self-learning has become a research direction. The number of forgotten data can be effectively controlled through big data processing technology to improve data support for the normal operation of third-party logistics.Practical implicationsIt can effectively reduce the time-consuming of data mining, realize the rapid and accurate convergence of sample data without increasing the complexity of samples, improve the efficiency of logistics big data mining, reduce the redundancy of historical data, and has a certain reference value in promoting the development of logistics industry.Originality/valueThe classification algorithm proposed in this paper has feasibility and high convergence in third-party logistics big data mining. The α-SVMSGD algorithm proposed in this paper has a certain application value in real-time logistics data mining, but the design of the forgetting factor threshold needs to be improved. In the future, the authors will continue to study how to set different data type thresholds in self-learning.

2020 ◽  
Vol 23 (65) ◽  
pp. 86-99
Author(s):  
Suresh K ◽  
Karthik S ◽  
Hanumanthappa M

With the progressions in Information and Communication Technology (ICT), the innumerable electronic devices (like smart sensors) and several software applications can proffer notable contributions to the challenges that are existent in monitoring plants. In the prevailing work, the segmentation accuracy and classification accuracy of the Disease Monitoring System (DMS), is low. So, the system doesn't properly monitor the plant diseases. To overcome such drawbacks, this paper proposed an efficient monitoring system for paddy leaves based on big data mining. The proposed model comprises 5 phases: 1) Image acquisition, 2) segmentation, 3) Feature extraction, 4) Feature Selection along with 5) Classification Validation. Primarily, consider the paddy leaf image which is taken as of the dataset as the input. Then, execute image acquisition phase where 3 steps like, i) transmute RGB image to grey scale image, ii) Normalization for high intensity, and iii) preprocessing utilizing Alpha-trimmed mean filter (ATMF) through which the noises are eradicated and its nature is the hybrid of the mean as well as median filters, are performed. Next, segment the resulting image using Fuzzy C-Means (i.e. FCM) Clustering Algorithm. FCM segments the diseased portion in the paddy leaves. In the next phase, features are extorted, and then the resulted features are chosen by utilizing Multi-Verse Optimization (MVO) algorithm. After completing feature selection, the chosen features are classified utilizing ANFIS (Adaptive Neuro-Fuzzy Inference System). Experiential results contrasted with the former SVM classifier (Support Vector Machine) and the prevailing methods in respect of precision, recall, F-measure,sensitivity accuracy, and specificity. In accuracy level, the proposed one has 97.28% but the prevailing techniques only offer 91.2% for SVM classifier, 85.3% for KNN and 88.78% for ANN. Hence, this proposed DMS has more accurate detection and classification process than the other methods. The proposed DMS evinces better accuracy when contrasting with the prevailing methods.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Jiake Fu ◽  
Huijing Tian ◽  
Lingguang Song ◽  
Mingchao Li ◽  
Shuo Bai ◽  
...  

PurposeThis paper presents a new approach of productivity estimation of cutter suction dredger operation through data mining and learning from real-time big data.Design/methodology/approachThe paper used big data, data mining and machine learning techniques to extract features of cutter suction dredgers (CSD) for predicting its productivity. ElasticNet-SVR (Elastic Net-Support Vector Machine) method is used to filter the original monitoring data. Along with the actual working conditions of CSD, 15 features were selected. Then, a box plot was used to clean the corresponding data by filtering out outliers. Finally, four algorithms, namely SVR (Support Vector Regression), XGBoost (Extreme Gradient Boosting), LSTM (Long-Short Term Memory Network) and BP (Back Propagation) Neural Network, were used for modeling and testing.FindingsThe paper provided a comprehensive forecasting framework for productivity estimation including feature selection, data processing and model evaluation. The optimal coefficient of determination (R2) of four algorithms were all above 80.0%, indicating that the features selected were representative. Finally, the BP neural network model coupled with the SVR model was selected as the final model.Originality/valueMachine-learning algorithm incorporating domain expert judgments was used to select predictive features. The final optimal coefficient of determination (R2) of the coupled model of BP neural network and SVR is 87.6%, indicating that the method proposed in this paper is effective for CSD productivity estimation.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Sumeer Gul ◽  
Shohar Bano ◽  
Taseen Shah

Purpose Data mining along with its varied technologies like numerical mining, textual mining, multimedia mining, web mining, sentiment analysis and big data mining proves itself as an emerging field and manifests itself in the form of different techniques such as information mining; big data mining; big data mining and Internet of Things (IoT); and educational data mining. This paper aims to discuss how these technologies and techniques are used to derive information and, eventually, knowledge from data. Design/methodology/approach An extensive review of literature on data mining and its allied techniques was carried to ascertain the emerging procedures and techniques in the domain of data mining. Clarivate Analytic’s Web of Science and Sciverse Scopus were explored to discover the extent of literature published on Data Mining and its varied facets. Literature was searched against various keywords such as data mining; information mining; big data; big data and IoT; and educational data mining. Further, the works citing the literature on data mining were also explored to visualize a broad gamut of emerging techniques about this growing field. Findings The study validates that knowledge discovery in databases has rendered data mining as an emerging field; the data present in these databases paves the way for data mining techniques and analytics. This paper provides a unique view about the usage of data, and logical patterns derived from it, how new procedures, algorithms and mining techniques are being continuously upgraded for their multipurpose use for the betterment of human life and experiences. Practical implications The paper highlights different aspects of data mining, its different technological approaches, and how these emerging data technologies are used to derive logical insights from data and make data more meaningful. Originality/value The paper tries to highlight the current trends and facets of data mining.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Laouni Djafri

PurposeThis work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P networks, clusters, clouds computing or other technologies.Design/methodology/approachIn the age of Big Data, all companies want to benefit from large amounts of data. These data can help them understand their internal and external environment and anticipate associated phenomena, as the data turn into knowledge that can be used for prediction later. Thus, this knowledge becomes a great asset in companies' hands. This is precisely the objective of data mining. But with the production of a large amount of data and knowledge at a faster pace, the authors are now talking about Big Data mining. For this reason, the authors’ proposed works mainly aim at solving the problem of volume, veracity, validity and velocity when classifying Big Data using distributed and parallel processing techniques. So, the problem that the authors are raising in this work is how the authors can make machine learning algorithms work in a distributed and parallel way at the same time without losing the accuracy of classification results. To solve this problem, the authors propose a system called Dynamic Distributed and Parallel Machine Learning (DDPML) algorithms. To build it, the authors divided their work into two parts. In the first, the authors propose a distributed architecture that is controlled by Map-Reduce algorithm which in turn depends on random sampling technique. So, the distributed architecture that the authors designed is specially directed to handle big data processing that operates in a coherent and efficient manner with the sampling strategy proposed in this work. This architecture also helps the authors to actually verify the classification results obtained using the representative learning base (RLB). In the second part, the authors have extracted the representative learning base by sampling at two levels using the stratified random sampling method. This sampling method is also applied to extract the shared learning base (SLB) and the partial learning base for the first level (PLBL1) and the partial learning base for the second level (PLBL2). The experimental results show the efficiency of our solution that the authors provided without significant loss of the classification results. Thus, in practical terms, the system DDPML is generally dedicated to big data mining processing, and works effectively in distributed systems with a simple structure, such as client-server networks.FindingsThe authors got very satisfactory classification results.Originality/valueDDPML system is specially designed to smoothly handle big data mining classification.


Sign in / Sign up

Export Citation Format

Share Document