Data Mining for Discovering Effective Time-Series Transition of Learning Strategies on Mutual Viewing-Based Learning

We aim to develop a real-time feedback system of learning strategies during lesson time to improve academic achievement. It has been known that mutual viewing-based learning is an effective educational method. However, even though mutual viewing is an effective lesson style, there are effective or ineffective learning strategies in the learners’ individual activities. In general, the method of evaluating learning strategies is a questionnaire survey. However, the questionnaire cannot measure the learning strategies in real time. Thus, it is difficult to detect the students who use ineffective learning strategies during lesson time in real time. Recently, a system that can measure the learning strategies in real time has been developed. Using this system, it is possible to detect students who use ineffective learning strategies during lesson time on the mutual viewing-based learning. From this point of view, we aim to develop a recommendation system for real-time learning strategies for teachers and students to achieve a highly educational effect. For this purpose, we must know the features of effective or ineffective learning strategies via a system that can measure learning strategies. In this paper, we report the discovery of features of effective or ineffective learning strategies based on the data-mining approach using thek-means method, transition diagram, and random forest. We classified the time-series learning strategies over 40 min into 216 strategies and surveyed the improvement probability of academic achievement via a random-forest-based classification model. By embedding our results into the system, we may be able to automatically detect students who use ineffective learning strategies and recommend effective learning strategies.

Download Full-text

Improving Medication Regimen Recommendation for Parkinson’s Disease Using Sensor Technology

Sensors ◽

10.3390/s21103553 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3553

Author(s):

Jeremy Watts ◽

Anahita Khojandi ◽

Rama Vasudevan ◽

Fatta B. Nahab ◽

Ritesh A. Ramdhani

Keyword(s):

Parkinson’S Disease ◽

Time Series ◽

Parkinson's Disease ◽

Random Forest ◽

Treatment Planning ◽

Time Series Data ◽

Classification Model ◽

Series Data ◽

Demographic Information ◽

Subjective Data

Parkinson’s disease medication treatment planning is generally based on subjective data obtained through clinical, physician-patient interactions. The Personal KinetiGraph™ (PKG) and similar wearable sensors have shown promise in enabling objective, continuous remote health monitoring for Parkinson’s patients. In this proof-of-concept study, we propose to use objective sensor data from the PKG and apply machine learning to cluster patients based on levodopa regimens and response. The resulting clusters are then used to enhance treatment planning by providing improved initial treatment estimates to supplement a physician’s initial assessment. We apply k-means clustering to a dataset of within-subject Parkinson’s medication changes—clinically assessed by the MDS-Unified Parkinson’s Disease Rating Scale-III (MDS-UPDRS-III) and the PKG sensor for movement staging. A random forest classification model was then used to predict patients’ cluster allocation based on their respective demographic information, MDS-UPDRS-III scores, and PKG time-series data. Clinically relevant clusters were partitioned by levodopa dose, medication administration frequency, and total levodopa equivalent daily dose—with the PKG providing similar symptomatic assessments to physician MDS-UPDRS-III scores. A random forest classifier trained on demographic information, MDS-UPDRS-III scores, and PKG time-series data was able to accurately classify subjects of the two most demographically similar clusters with an accuracy of 86.9%, an F1 score of 90.7%, and an AUC of 0.871. A model that relied solely on demographic information and PKG time-series data provided the next best performance with an accuracy of 83.8%, an F1 score of 88.5%, and an AUC of 0.831, hence further enabling fully remote assessments. These computational methods demonstrate the feasibility of using sensor-based data to cluster patients based on their medication responses with further potential to assist with medication recommendations.

Download Full-text

Demand Supply Oriented Taxi Suggestion System for Vehicular Social Networks with Fuel Charging Mechanism

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit19515 ◽

2019 ◽

pp. 38-44

Author(s):

Selvi C ◽

Keerthana D

Keyword(s):

Data Mining ◽

Real Time ◽

Large Scale ◽

Recommendation System ◽

Important Research ◽

Taxi Drivers ◽

Charging Station ◽

Gps Trajectories ◽

Charging Mechanism ◽

Supply Level

Data mining depends on large-scale taxi traces is an important research concepts. A vital direction for analyzing taxi GPS dataset is to suggest cruising areas for taxi drivers. The project first investigates the real-time demand-supply level for taxis, and then makes an adaptive tradeoff between the utilities of drivers and passengers for different hotspots. This project constructs a recommendation system by jointly considering the profits of both drivers and passengers. At last, the qualified candidates are suggested to drivers based on analysis. The project also provides a real-time charging station recommendation system for EV taxis via large-scale GPS data mining. By combining each EV taxi’s historical recharging actions and real-time GPS trajectories, the present operational state of each taxi is predicted. Based on this information, for an EV taxi requesting a recommendation, recommend a charging station that leads to the minimal total time before its recharging starts.

Download Full-text

Garment Categorization Using Data Mining Techniques

Symmetry ◽

10.3390/sym12060984 ◽

2020 ◽

Vol 12 (6) ◽

pp. 984

Author(s):

Sheenam Jain ◽

Vijay Kumar

Keyword(s):

Data Mining ◽

Supply Chain ◽

Random Forest ◽

Good Deal ◽

Classification Model ◽

Upper Body ◽

Whole Body ◽

Product Data ◽

Soft Classification ◽

Using Data

The apparel industry houses a huge amount and variety of data. At every step of the supply chain, data is collected and stored by each supply chain actor. This data, when used intelligently, can help with solving a good deal of problems for the industry. In this regard, this article is devoted to the application of data mining on the industry’s product data, i.e., data related to a garment, such as fabric, trim, print, shape, and form. The purpose of this article is to use data mining and symmetry-based learning techniques on product data to create a classification model that consists of two subsystems: (1) for predicting the garment category and (2) for predicting the garment sub-category. Classification techniques, such as Decision Trees, Naïve Bayes, Random Forest, and Bayesian Forest were applied to the ‘Deep Fashion’ open-source database. The data contain three garment categories, 50 garment sub-categories, and 1000 garment attributes. The two subsystems were first trained individually and then integrated using soft classification. It was observed that the performance of the random forest classifier was comparatively better, with an accuracy of 86%, 73%, 82%, and 90%, respectively, for the garment category, and sub-categories of upper body garment, lower body garment, and whole-body garment.

Download Full-text

PENERAPAN TEKNIK KLASIFIKASI PADA SISTEM REKOMENDASI MENGGUNAKAN ALGORITMA GENETIKA

Jurnal Ilmiah Teknologi Infomasi Terapan ◽

10.33197/jitter.vol2.iss3.2016.108 ◽

2016 ◽

Vol 2 (3) ◽

Author(s):

Rita Rismala ◽

Mahmud Dwi Sulistiyo

Keyword(s):

Data Mining ◽

Genetic Algorithm ◽

Recommendation System ◽

Classification Model ◽

Mutation Probability ◽

Linear Classifier ◽

Average Accuracy ◽

Crossover Probability ◽

Best Parameter

[Id]Sistem rekomendasi yang dibangun dalam penelitian ini adalah sistem rekomendasi yang dapat memberikan rekomendasi sebuah item terbaik kepada user. Dari sisi data mining, pembangunan sistem rekomendasi satu item ini dapat dipandang sebagai upaya untuk membangun sebuah model classifier yang dapat digunakan untuk mengelompokkan data ke dalam satu kelas tertentu. Model classifier yang digunakan bersifat linier. Untuk menghasilkan konfigurasi model classifier yang optimal digunakan Algoritma Genetika (AG). Performansi AG dalam melakukan optimasi pada model klasifikasi linier yang digunakan cukup dapat diterima. Untuk dataset yang digunakan dengan kombinasi nilai parameter terbaik yaitu yaitu ukuran populasi 50, probabilitas crossover 0.7, dan probabilitas mutasi 0.1, diperoleh rata-rata akurasi sebesar 72.80% dengan rata-rata waktu proses 6.04 detik, sehingga penerapan teknik klasifikasi menggunakan AG dapat menjadi solusi alternatif dalam membangun sebuah sistem rekomendasi, namun dengan tetap memperhatikan pengaturan nilai parameter yang sesuai dengan permasalahan yang dihadapi.Kata kunci:sistem rekomendasi, klasifikasi, Algoritma Genetika[En]In this study was developed a recommendation system that can recommend top-one item to a user. In terms of data mining, it can be seen as a problem to develop a classifier model that can be used to classify data into one particular class. The model used was a linear classifier. To produce the optimal configuration of classifier model was used Genetic Algorithm (GA). GA performance in optimizing the linear classification model was acceptable. Using the case study dataset and combination of the best parameter value, namely population size 50, crossover probability 0.7 and mutation probability 0.1, obtained average accuracy 72.80% and average processing time of 6.04 seconds, so that the implementation of classification techniques using GA can be an alternative solution in developing a recommender system, due regard to setting the parameter value depend on the encountered problem.Keywords:Recommendation system, classification, Genetic Algorithm

Download Full-text

Differential privacy based classification model for mining medical data stream using adaptive random forest

Acta Universitatis Sapientiae Informatica ◽

10.2478/ausi-2021-0001 ◽

2021 ◽

Vol 13 (1) ◽

pp. 1-20

Author(s):

Hayder K. Fatlawi ◽

Attila Kiss

Keyword(s):

Data Mining ◽

Random Forest ◽

Data Stream ◽

Differential Privacy ◽

Medical Data ◽

The Other ◽

Classification Model ◽

Mining Operations ◽

Typical Data ◽

Stable Performance

Abstract Most typical data mining techniques are developed based on training the batch data which makes the task of mining the data stream represent a significant challenge. On the other hand, providing a mechanism to perform data mining operations without revealing the patient’s identity has increasing importance in the data mining field. In this work, a classification model with differential privacy is proposed for mining the medical data stream using Adaptive Random Forest (ARF). The experimental results of applying the proposed model on four medical datasets show that ARF mostly has a more stable performance over the other six techniques.

Download Full-text

Automatic Identification of Rock Formation Type While Drilling Using Machine Learning Based Data-Driven Models

10.2118/201020-ms ◽

2021 ◽

Author(s):

Enrique Z. Losoya ◽

Narendra Vishnumolakala ◽

Samuel F. Noynaert ◽

Zenon Medina-Cetina ◽

Satish Bukkapatnam ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Time ◽

Prediction Accuracy ◽

Classification Model ◽

Data Driven ◽

Classification Algorithms ◽

Rock Formation ◽

Mechanical Specific Energy ◽

Formation Type

Abstract The objective of this study is to present a novel rock formation identification model using a data-driven modeling approach. This study explores the use of real-time drilling data to train and validate a classification model to improve the efficiency of the drilling process by reducing Mechanical Specific Energy (MSE). In this study, we demonstrate the feasibility of a layer-based determination and change detection of properties of rock formation currently being drilled as accurately and fast as possible. Data for this study was collected from a custom-built lab-scale drilling rig equipped with multiple sensors. The experiment was conducted by drilling through an arrangement of different rock formations of varying rock strength properties. Data was recorded and stored at a frequency of 2 kHz, then filtered, processed, and downsampled to extract relevant features. This dataset was used to train an Artificial Neural Network and other machine learning classification algorithms. Feature selection was made first with ten most notable features found by Random Forest, and the second set with derived measurements and down-sampled dynamic features from the sensors. The classification analysis was divided into two steps: the best predictors/features extraction and classification model building. The models were trained using multiple classification algorithms, namely logistic regression, linear discriminant analysis (LDA), Support Vector Machines (SVM), Random Forest (RF), and Artificial Neural Networks (ANN). It was found that random forest and ANN performed the best with prediction accuracy of 99.48% and 99.58%, respectively, for the data set with ten most prominent features. The high prediction rate accuracy for the most prominent predictors suggests that if the high-frequency data can be processed in real-time, predicting what formation we are drilling in is possible to achieve in near real-time. This can lead to significant savings for drilling companies as optimal drilling parameters can be computed, and in turn, optimized Mechanical Specific Energy can be obtained in real-time. Since the rock formation identification is time-consuming, we also describe here an alternative approach using slightly less accurate but equally powerful dynamic predictors. In this case, we show that our dynamic predictor models with RF and ANN yielded prediction accuracy of 96.30% and 95.61%, respectively. Both the prominent feature and dynamic predictor approaches are described in detail in this paper. Our results suggest that accurately predicting rock formation type in real-time while drilling is very much feasible with lesser computational cost and complexity. This study provides the building blocks for the development of a completely autonomous downhole device and Electronic Device Recorders (EDR) that reduces the need for highly sophisticated sensors or data transmission processes downhole.

Download Full-text

Time Series Data Mining in Real Time Surface Runoff Forecasting through Support Vector Machine

International Journal of Computer Applications ◽

10.5120/17163-7223 ◽

2014 ◽

Vol 98 (3) ◽

pp. 23-28 ◽

Cited By ~ 1

Author(s):

Vinayak Choubey ◽

Satanand Mishra ◽

S. K. Pandey

Keyword(s):

Data Mining ◽

Time Series ◽

Support Vector Machine ◽

Real Time ◽

Surface Runoff ◽

Time Series Data ◽

Series Data ◽

Support Vector ◽

Time Series Data Mining ◽

Runoff Forecasting

Download Full-text

Pattern Classification Model Design and Performance Comparison for Data Mining of Time Series Data

Journal of Korean institute of intelligent systems ◽

10.5391/jkiis.2011.21.6.730 ◽

2011 ◽

Vol 21 (6) ◽

pp. 730-736 ◽

Cited By ~ 2

Author(s):

Soo-Yong Lee ◽

Kyoung-Joung Lee

Keyword(s):

Data Mining ◽

Time Series ◽

Pattern Classification ◽

Time Series Data ◽

Performance Comparison ◽

Classification Model ◽

Series Data ◽

Model Design ◽

And Performance

Download Full-text

A Multi-Feature Ensemble Learning Classification Method for Ship Classification with Space-Based AIS Data

Applied Sciences ◽

10.3390/app112110336 ◽

2021 ◽

Vol 11 (21) ◽

pp. 10336

Author(s):

Yitao Wang ◽

Lei Yang ◽

Xin Song ◽

Quan Chen ◽

Zhenguo Yan

Keyword(s):

Time Series ◽

Random Forest ◽

Ensemble Learning ◽

Classification Model ◽

Gradient Boosting ◽

Identification System ◽

Dynamic Feature ◽

Passenger Ships ◽

Extreme Gradient Boosting ◽

Ship Classification

AIS (Automatic Identification System) is an effective navigation aid system aimed to realize ship monitoring and collision avoidance. Space-based AIS data, which are received by satellites, have become a popular and promising approach for providing ship information around the world. To recognize the types of ships from the massive space-based AIS data, we propose a multi-feature ensemble learning classification model (MFELCM). The method consists of three steps. Firstly, the static and dynamic information of the original data is preprocessed and features are then extracted in order to obtain static feature samples, dynamic feature distribution samples, time-series samples, and time-series feature samples. Secondly, four base classifiers, namely Random Forest, 1D-CNN (one-dimensional convolutional neural network), Bi-GRU (bidirectional gated recurrent unit), and XGBoost (extreme gradient boosting), are trained by the above four types of samples, respectively. Finally, the base classifiers are integrated by another Random Forest, and the final ship classification is outputted. In this paper, we use the global space-based AIS data of passenger ships, cargo ships, fishing boats, and tankers. The model gets a total accuracy of 0.9010 and an F1 score of 0.9019. The experiments prove that MFELCM is better than the base classifiers. In addition, MFELCM can achieve near real-time online classification, which has important applications in ship behavior anomaly detection and maritime supervision.

Download Full-text

Proximate Breast Cancer Factors Using Data Mining Classification Techniques

International Journal of Big Data and Analytics in Healthcare ◽

10.4018/ijbdah.2019010104 ◽

2019 ◽

Vol 4 (1) ◽

pp. 47-56

Author(s):

Alice Constance Mensah ◽

Isaac Ofori Asare

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Random Forest ◽

Cancer Patients ◽

Classification Model ◽

Tree Model ◽

Cancer Data ◽

Data Mining Approach ◽

Learning Techniques ◽

Using Data

Breast cancer is the most common of all cancers and is the leading cause of cancer deaths in women worldwide. The classification of breast cancer data can be useful to predict the outcome of some diseases or discover the genetic behavior of tumors. Data mining technology helps in classifying cancer patients and this technique helps to identify potential cancer patients by simply analyzing the data. This study examines the determinant factors of breast cancer and measures the breast cancer patient data to build a useful classification model using a data mining approach. In this study of 2397 women, 1022 (42.64%) were diagnosed with breast cancer. Among the four main learning techniques such as: Random Forest, Naive Bayes, Classification and Regression Model (CART), and Boosted Tree model were used for the study. The Random Forest technique had the better accuracy value of 0.9892(95%CI,0.9832 -0.9935) and a sensitivity value of about 92%. This means that the Random Forest learning model is the best model to classify and predict breast cancer based on associated factors.

Download Full-text