Information Visualization Using Clustering and Predictive Model

2020 ◽  
Vol 11 (4) ◽  
pp. 24-36
Author(s):  
Manojit Chattopadhyay ◽  
Debdatta Pal

This paper aims to reveal the impact of rainfall on tea export from India, an issue that remained unexplored in the existing literature. This study explores a new model to predict India's tea export more accurately that would be helpful for Indian tea planters and exporters to plan their production as well as the inventory holding for deriving maximum value from tea export. A two-stage modelling approach has been developed. Firstly, an artificial intelligence-based growing hierarchical self-organising map algorithm is employed on the monthly time series of monthly frequency spreading over April 2005 to December 2013 to segregate India's monthly tea export data series into visual clusters of recognized pattern. Further, a predictive model using support vector machine has been developed and applied to forecast the tea export and then the importance of the predictor variables of the tea export have been identified. Finally, using the appropriate measures of performance a comparative analysis has been performed for each of the model. The newness of the study pertains to the two facts revealed from the study: firstly, India's tea export is embedded of complexity and nonlinearity, which could receive a successful clustering through growing hierarchical self organizing map that would make a deeper analysis easier with a further application of rich statistical techniques. Secondly, the analysis of prediction errors and the relative importance of the predictor variables establish rainfall as one of the most significant variable in predicting India's tea export, insight that has never surfaced in the literature developed thus far.

2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Qianqian Han ◽  
Bo Yan ◽  
Guobao Ning ◽  
B. Yu

An improved SVM model is presented to forecast dry bulk freight index (BDI) in this paper, which is a powerful tool for operators and investors to manage the market trend and avoid price risking shipping industry. The BDI is influenced by many factors, especially the random incidents in dry bulk market, inducing the difficulty in forecasting of BDI. Therefore, to eliminate the impact of random incidents in dry bulk market, wavelet transform is adopted to denoise the BDI data series. Hence, the combined model of wavelet transform and support vector machine is developed to forecast BDI in this paper. Lastly, the BDI data in 2005 to 2012 are presented to test the proposed model. The 84 prior consecutive monthly BDI data are the inputs of the model, and the last 12 monthly BDI data are the outputs of model. The parameters of the model are optimized by genetic algorithm and the final model is conformed through SVM training. This paper compares the forecasting result of proposed method and three other forecasting methods. The result shows that the proposed method has higher accuracy and could be used to forecast the short-term trend of the BDI.


2017 ◽  
Vol 43 (3) ◽  
pp. 74-81 ◽  
Author(s):  
Bartosz Szeląg ◽  
Lidia Bartkiewicz ◽  
Jan Studziński ◽  
Krzysztof Barbusiński

AbstractThe aim of the study was to evaluate the possibility of applying different methods of data mining to model the inflow of sewage into the municipal sewage treatment plant. Prediction models were elaborated using methods of support vector machines (SVM), random forests (RF), k-nearest neighbour (k-NN) and of Kernel regression (K). Data consisted of the time series of daily rainfalls, water level measurements in the clarified sewage recipient and the wastewater inflow into the Rzeszow city plant. Results indicate that the best models with one input delayed by 1 day were obtained using the k-NN method while the worst with the K method. For the models with two input variables and one explanatory one the smallest errors were obtained if model inputs were sewage inflow and rainfall data delayed by 1 day and the best fit is provided using RF method while the worst with the K method. In the case of models with three inputs and two explanatory variables, the best results were reported for the SVM and the worst for the K method. In the most of the modelling runs the smallest prediction errors are obtained using the SVM method and the biggest ones with the K method. In the case of the simplest model with one input delayed by 1 day the best results are provided using k-NN method and by the models with two inputs in two modelling runs the RF method appeared as the best.


2011 ◽  
Vol 84-85 ◽  
pp. 405-409
Author(s):  
Wei He ◽  
Jie Xiong

Potential knowledge useful for traffic management optimization is hidden in a huge amount of data. Previous works use the prior data pattern labels to train the artificial neural network to attain the intelligent data mining models. The performance of the models suffers from the experts’ experience. To relieve the impact of the human factor, a new hybrid intelligent data mining model is proposed in this work based on self-organizing map (SOM) and support vector machine (SVM). The SOM was firstly used to capture the clustering information of the database through an unsupervised manner. Then the identified samples were treated as input to train the SVM. To optimize the SVM model, the particle swarm optimization (PSO) algorithm was employed to tune the SVM parameters and hence the satisfactory SVM data mining model was obtained. 2000 practical data sets from the Intelligent Transportation Systems (ITS) were applied to the validation of the proposed mining model. The analysis results show that the proposed method can extract the underlying rules of the testing data and can predict the future traffic state with the accuracy beyond 97%. Hence, the new SOM-PSO-SVM data mining model can provide practical application for the ITS.


2018 ◽  
Vol 21 (1) ◽  
pp. 92-103 ◽  
Author(s):  
Kiyoumars Roushangar ◽  
Roghayeh Ghasempour

Abstract Rough bed channels are one of the appurtenances used to dissipate the extra energy of the flow through hydraulic jump. The aim of this paper is to assess the effects of channel geometry and rough boundary conditions (i.e., rectangular, trapezoidal, and expanding channels with different rough elements) in predicting the hydraulic jump energy dissipation using support vector machine (SVM) as a meta-model approach. Using different experimental data series, different models were developed with and without considering dimensional analysis. The results approved capability of the SVM model in predicting the relative energy dissipation. It was found that the developed models for expanding channel with central sill performed more successfully and, for this case, superior performance was obtained for the model with parameters Fr1 and h1/B. Considering the rectangular and trapezoidal channels, the model with parameters Fr1, (h2−h1)/h1, W/Z led to better predictions. It was observed that between two types of strip and staggered rough elements, strip type led to more accurate results. The obtained results showed that the developed models for the case of simulation based on dimensional analysis yielded better predictions. The sensitivity analysis results showed that Froude number had the most significant impact on the modeling.


2020 ◽  
Vol 39 (6) ◽  
pp. 8927-8935
Author(s):  
Bing Zheng ◽  
Dawei Yun ◽  
Yan Liang

Under the impact of COVID-19, research on behavior recognition are highly needed. In this paper, we combine the algorithm of self-adaptive coder and recurrent neural network to realize the research of behavior pattern recognition. At present, most of the research of human behavior recognition is focused on the video data, which is based on the video number. At the same time, due to the complexity of video image data, it is easy to violate personal privacy. With the rapid development of Internet of things technology, it has attracted the attention of a large number of experts and scholars. Researchers have tried to use many machine learning methods, such as random forest, support vector machine and other shallow learning methods, which perform well in the laboratory environment, but there is still a long way to go from practical application. In this paper, a recursive neural network algorithm based on long and short term memory (LSTM) is proposed to realize the recognition of behavior patterns, so as to improve the accuracy of human activity behavior recognition.


2021 ◽  
pp. 232102222110243
Author(s):  
Mohuya Deb Purkayastha ◽  
Joyeeta Deb ◽  
Ram Pratap Sinha

The present study estimated labour-use efficiency of 48 branches of Assam Gramin Vikash Bank at its branch level, covering three districts of Barak Valley, which falls under Silchar region of the bank for the time period from 2010–2011 to 2017–2018. The study applied data envelopment analysis for estimating labour-use efficiency. In the second stage, the study applied censored Tobit regression for determining the impact of several contextual variables on efficiency. The study reveals that the mean labour-use efficiency score of the selected branches is 76% when averaged for the in-sample branches over the observation period. Results of the Tobit regression identified cluster 2 and total business of the branches as the significant factors for determining efficiency and the number of employees as a significant variable influencing inefficiency. JEL Classifications: G2, G20, G21, J3


2021 ◽  
Vol 11 (2) ◽  
pp. 796
Author(s):  
Alhanoof Althnian ◽  
Duaa AlSaeed ◽  
Heyam Al-Baity ◽  
Amani Samha ◽  
Alanoud Bin Dris ◽  
...  

Dataset size is considered a major concern in the medical domain, where lack of data is a common occurrence. This study aims to investigate the impact of dataset size on the overall performance of supervised classification models. We examined the performance of six widely-used models in the medical field, including support vector machine (SVM), neural networks (NN), C4.5 decision tree (DT), random forest (RF), adaboost (AB), and naïve Bayes (NB) on eighteen small medical UCI datasets. We further implemented three dataset size reduction scenarios on two large datasets and analyze the performance of the models when trained on each resulting dataset with respect to accuracy, precision, recall, f-score, specificity, and area under the ROC curve (AUC). Our results indicated that the overall performance of classifiers depend on how much a dataset represents the original distribution rather than its size. Moreover, we found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT. Furthermore, an interesting observation is that a robust machine learning model to limited dataset does not necessary imply that it provides the best performance compared to other models.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Tawfik Yahya ◽  
Nur Azah Hamzaid ◽  
Sadeeq Ali ◽  
Farahiyah Jasni ◽  
Hanie Nadia Shasmin

AbstractA transfemoral prosthesis is required to assist amputees to perform the activity of daily living (ADL). The passive prosthesis has some drawbacks such as utilization of high metabolic energy. In contrast, the active prosthesis consumes less metabolic energy and offers better performance. However, the recent active prosthesis uses surface electromyography as its sensory system which has weak signals with microvolt-level intensity and requires a lot of computation to extract features. This paper focuses on recognizing different phases of sitting and standing of a transfemoral amputee using in-socket piezoelectric-based sensors. 15 piezoelectric film sensors were embedded in the inner socket wall adjacent to the most active regions of the agonist and antagonist knee extensor and flexor muscles, i. e. region with the highest level of muscle contractions of the quadriceps and hamstring. A male transfemoral amputee wore the instrumented socket and was instructed to perform several sitting and standing phases using an armless chair. Data was collected from the 15 embedded sensors and went through signal conditioning circuits. The overlapping analysis window technique was used to segment the data using different window lengths. Fifteen time-domain and frequency-domain features were extracted and new feature sets were obtained based on the feature performance. Eight of the common pattern recognition multiclass classifiers were evaluated and compared. Regression analysis was used to investigate the impact of the number of features and the window lengths on the classifiers’ accuracies, and Analysis of Variance (ANOVA) was used to test significant differences in the classifiers’ performances. The classification accuracy was calculated using k-fold cross-validation method, and 20% of the data set was held out for testing the optimal classifier. The results showed that the feature set (FS-5) consisting of the root mean square (RMS) and the number of peaks (NP) achieved the highest classification accuracy in five classifiers. Support vector machine (SVM) with cubic kernel proved to be the optimal classifier, and it achieved a classification accuracy of 98.33 % using the test data set. Obtaining high classification accuracy using only two time-domain features would significantly reduce the processing time of controlling a prosthesis and eliminate substantial delay. The proposed in-socket sensors used to detect sit-to-stand and stand-to-sit movements could be further integrated with an active knee joint actuation system to produce powered assistance during energy-demanding activities such as sit-to-stand and stair climbing. In future, the system could also be used to accurately predict the intended movement based on their residual limb’s muscle and mechanical behaviour as detected by the in-socket sensory system.


2021 ◽  
Vol 11 (13) ◽  
pp. 5895
Author(s):  
Kristina Serec ◽  
Sanja Dolanski Babić

The double-stranded B-form and A-form have long been considered the two most important native forms of DNA, each with its own distinct biological roles and hence the focus of many areas of study, from cellular functions to cancer diagnostics and drug treatment. Due to the heterogeneity and sensitivity of the secondary structure of DNA, there is a need for tools capable of a rapid and reliable quantification of DNA conformation in diverse environments. In this work, the second paper in the series that addresses conformational transitions in DNA thin films utilizing FTIR spectroscopy, we exploit popular chemometric methods: the principal component analysis (PCA), support vector machine (SVM) learning algorithm, and principal component regression (PCR), in order to quantify and categorize DNA conformation in thin films of different hydrated states. By complementing FTIR technique with multivariate statistical methods, we demonstrate the ability of our sample preparation and automated spectral analysis protocol to rapidly and efficiently determine conformation in DNA thin films based on the vibrational signatures in the 1800–935 cm−1 range. Furthermore, we assess the impact of small hydration-related changes in FTIR spectra on automated DNA conformation detection and how to avoid discrepancies by careful sampling.


Author(s):  
Jia-Bin Zhou ◽  
Yan-Qin Bai ◽  
Yan-Ru Guo ◽  
Hai-Xiang Lin

AbstractIn general, data contain noises which come from faulty instruments, flawed measurements or faulty communication. Learning with data in the context of classification or regression is inevitably affected by noises in the data. In order to remove or greatly reduce the impact of noises, we introduce the ideas of fuzzy membership functions and the Laplacian twin support vector machine (Lap-TSVM). A formulation of the linear intuitionistic fuzzy Laplacian twin support vector machine (IFLap-TSVM) is presented. Moreover, we extend the linear IFLap-TSVM to the nonlinear case by kernel function. The proposed IFLap-TSVM resolves the negative impact of noises and outliers by using fuzzy membership functions and is a more accurate reasonable classifier by using the geometric distribution information of labeled data and unlabeled data based on manifold regularization. Experiments with constructed artificial datasets, several UCI benchmark datasets and MNIST dataset show that the IFLap-TSVM has better classification accuracy than other state-of-the-art twin support vector machine (TSVM), intuitionistic fuzzy twin support vector machine (IFTSVM) and Lap-TSVM.


Sign in / Sign up

Export Citation Format

Share Document