scholarly journals Building an Expert System based on Data Mining

Author(s):  
Sagar Bhushan Gawde ◽  
Umesh Kulkarni

A novel framework for predicting stock trends and making financial trading, decisions based on a combination of Data and Text Mining techniques. The prediction models of the proposed system extract data in text content of time-stamped web documents in addition to traditional numerical time series data, which is also available from the Web. The financial trading system based on model predictions uses three different trading strategies. In this work, our system is simulated and evaluated on real-world series of news stories and stocks data using Decision Tree Induction Algorithm. Performance is the predictive accuracy of the induced models and, more importantly, the profitability of each trading strategy using these predictions.

2020 ◽  
Author(s):  
Hsiao-Ko Chang ◽  
Hui-Chih Wang ◽  
Chih-Fen Huang ◽  
Feipei Lai

BACKGROUND In most of Taiwan’s medical institutions, congestion is a serious problem for emergency departments. Due to a lack of beds, patients spend more time in emergency retention zones, which make it difficult to detect cardiac arrest (CA). OBJECTIVE We seek to develop a pharmaceutical early warning model to predict cardiac arrest in emergency departments via drug classification and medical expert suggestion. METHODS We propose a new early warning score model for detecting cardiac arrest via pharmaceutical classification and by using a sliding window; we apply learning-based algorithms to time-series data for a Pharmaceutical Early Warning Scoring Model (PEWSM). By treating pharmaceutical features as a dynamic time-series factor for cardiopulmonary resuscitation (CPR) patients, we increase sensitivity, reduce false alarm rates and mortality, and increase the model’s accuracy. To evaluate the proposed model we use the area under the receiver operating characteristic curve (AUROC). RESULTS Four important findings are as follows: (1) We identify the most important drug predictors: bits, and replenishers and regulators of water and electrolytes. The best AUROC of bits is 85%; that of replenishers and regulators of water and electrolytes is 86%. These two features are the most influential of the drug features in the task. (2) We verify feature selection, in which accounting for drugs improve the accuracy: In Task 1, the best AUROC of vital signs is 77%, and that of all features is 86%. In Task 2, the best AUROC of all features is 85%, which demonstrates that thus accounting for the drugs significantly affects prediction. (3) We use a better model: For traditional machine learning, this study adds a new AI technology: the long short-term memory (LSTM) model with the best time-series accuracy, comparable to the traditional random forest (RF) model; the two AUROC measures are 85%. (4) We determine whether the event can be predicted beforehand: The best classifier is still an RF model, in which the observational starting time is 4 hours before the CPR event. Although the accuracy is impaired, the predictive accuracy still reaches 70%. Therefore, we believe that CPR events can be predicted four hours before the event. CONCLUSIONS This paper uses a sliding window to account for dynamic time-series data consisting of the patient’s vital signs and drug injections. In a comparison with NEWS, we improve predictive accuracy via feature selection, which includes drugs as features. In addition, LSTM yields better performance with time-series data. The proposed PEWSM, which offers 4-hour predictions, is better than the National Early Warning Score (NEWS) in the literature. This also confirms that the doctor’s heuristic rules are consistent with the results found by machine learning algorithms.


2020 ◽  
Author(s):  
Hsiao-Ko Chang ◽  
Hui-Chih Wang ◽  
Chih-Fen Huang ◽  
Feipei Lai

BACKGROUND In most of Taiwan’s medical institutions, congestion is a serious problem for emergency departments. Due to a lack of beds, patients spend more time in emergency retention zones, which make it difficult to detect cardiac arrest (CA). OBJECTIVE We seek to develop a Drug Early Warning System Model (DEWSM), it included drug injections and vital signs as this research important features. We use it to predict cardiac arrest in emergency departments via drug classification and medical expert suggestion. METHODS We propose this new model for detecting cardiac arrest via drug classification and by using a sliding window; we apply learning-based algorithms to time-series data for a DEWSM. By treating drug features as a dynamic time-series factor for cardiopulmonary resuscitation (CPR) patients, we increase sensitivity, reduce false alarm rates and mortality, and increase the model’s accuracy. To evaluate the proposed model, we use the area under the receiver operating characteristic curve (AUROC). RESULTS Four important findings are as follows: (1) We identify the most important drug predictors: bits (intravenous therapy), and replenishers and regulators of water and electrolytes (fluid and electrolyte supplement). The best AUROC of bits is 85%, it means the medical expert suggest the drug features: bits, it will affect the vital signs, and then the evaluate this model correctly classified patients with CPR reach 85%; that of replenishers and regulators of water and electrolytes is 86%. These two features are the most influential of the drug features in the task. (2) We verify feature selection, in which accounting for drugs improve the accuracy: In Task 1, the best AUROC of vital signs is 77%, and that of all features is 86%. In Task 2, the best AUROC of all features is 85%, which demonstrates that thus accounting for the drugs significantly affects prediction. (3) We use a better model: For traditional machine learning, this study adds a new AI technology: the long short-term memory (LSTM) model with the best time-series accuracy, comparable to the traditional random forest (RF) model; the two AUROC measures are 85%. It can be seen that the use of new AI technology will achieve better results, currently comparable to the accuracy of traditional common RF, and the LSTM model can be adjusted in the future to obtain better results. (4) We determine whether the event can be predicted beforehand: The best classifier is still an RF model, in which the observational starting time is 4 hours before the CPR event. Although the accuracy is impaired, the predictive accuracy still reaches 70%. Therefore, we believe that CPR events can be predicted four hours before the event. CONCLUSIONS This paper uses a sliding window to account for dynamic time-series data consisting of the patient’s vital signs and drug injections. The National Early Warning Score (NEWS) only focuses on the score of vital signs, and does not include factors related to drug injections. In this study, the experimental results of adding the drug injections are better than only vital signs. In a comparison with NEWS, we improve predictive accuracy via feature selection, which includes drugs as features. In addition, we use traditional machine learning methods and deep learning (using LSTM method as the main processing time series data) as the basis for comparison of this research. The proposed DEWSM, which offers 4-hour predictions, is better than the NEWS in the literature. This also confirms that the doctor’s heuristic rules are consistent with the results found by machine learning algorithms.


Agromet ◽  
2007 ◽  
Vol 21 (2) ◽  
pp. 46 ◽  
Author(s):  
W. Estiningtyas ◽  
F. Ramadhani ◽  
E. Aldrian

<p>Significant decrease in rainfall caused extreme climate has significant impact on agriculture sector, especialy food crops production. It is one of reason and push developing of rainfall prediction models as anticipate from extreme climate events. Rainfall prediction models develop base on time series data, and then it has been included anomaly aspect, like rainfall prediction model with Kalman filtering method. One of global parameter that has been used as climate anomaly indicator is sea surface temperature. Some of research indicate, there are relationship between sea surface temperature and rainfall. Relationship between Indonesian rainfall and global sea surface temperature has been known, but its relationship with Indonesian’s sea surface temperature not know yet, especialy for rainfall in smaller area like district. So, therefore the research about relationship between rainfall in distric area and Indonesian’s sea surface temperature and it application for rainfall prediction is needed. Based on Indonesian’s sea surface temperature time series data Januari 1982 until Mei 2006 show there are zona of Indonesian’s sea surface temperature (with temperature more than 27,6 0C) dominan in Januari-Mei and moved with specific pattern. Highest value of spasial correlation beetwen Cilacap’s rainfall and Indonesian’s sea surface temperature is 0,30 until 0,50 with different zona of Indonesian’s sea surface temperature. Highest positive correlation happened in March and July. Negative correlation is -0,30 until -0,70 with highest negative correlation in May and June. Model validation resulted correlation coeffcient 85,73%, fits model 20,74%, r2 73,49%, RMSE 20,5% and standart deviation 37,96. Rainfall prediction Januari-Desember 2007 period indicated rainfall pattern is near same with average rainfall pattern, rainfall less than 100/month. The result of this research indicate Indonesian’s sea surface temperature can be used as indicator rainfall condition in distric area, that means rainfall in district area can be predicted based on Indonesian’s sea surface temperature in zona with highest correlation in every month.</p><p>------------------------------------------------------------------</p><p>Penurunan curah hujan yang cukup signifikan akibat iklim ekstrim telah membawa dampak yang cukup signifikan pula pada sektor pertanian, terutama produksi tanaman pangan. Hal ini menjadi salah satu alasan yang mendorong semakin berkembangnya model-model prakiraan hujan sebagai upaya antipasi terhadap kejadian iklim ekstrim. Model prakiraan hujan yang pada awalnya hanya berbasis pada data time series, kini telah berkembang dengan memperhitungkan aspek anomali iklim, seperti model prakiraan hujan dengan metode filter Kalman. Salah satu indikator global yang dapat digunakan sebagai indikator anomali iklim adalah suhu permukaan laut. Dari berbagai hasil penelitian diketahui bahwa suhu permukaan laut ini memiliki keterkaitan dengan kejadian curah hujan. Hubungan curah hujan Indonesia dengan suhu permukaan laut global sudah banyak diketahui, tetapi keterkaitannya dengan suhu permukaan laut wilayah Indonesia belum banyak mendapat perhatian, terutama untuk curah hujan pada cakupan yang lebih sempit seperti kabupaten. Oleh karena itu perlu dilakukan penelitian yang mengkaji hubungan kedua parameter tersebut serta mengaplikasikannya untuk prakiraan curah hujan pada wilayah Kabupaten. Hasil penelitian berdasarkan data suhu permukaan laut wilayah Indonesia rata-rata Januari 1982 hingga Mei 2006 menunjukkan zona dengan suhu lebih dari 27,6 0C yang dominan pada bulan Januari-Mei dan bergerak dengan pola yang cukup jelas. Korelasi spasial antara curah hujan kabupaten Cilacap dengan SPL wilayah Indonesia rata-rata bulan Januari-Desember menunjukkan korelasi positip tertinggi antara 0,30 hingga 0,50 dengan zona SPL yang beragam. Korelasi tertinggi terjadi pada bulan Maret dan Juli. Sedangkan korelasi negatip berkisar antara -0,30 hingga -0,70 dengan korelasi negatip tertinggi pada bulan Mei dan Juni. Validasi model prakiraan hujan menghasilkan nilai koefisien korelasi 85,73%, fits model 20,74%, r2 sebesar 73,49%, RMSE 20,5% dan standar deviasi 37,96. Hasil prakiraan hujan bulanan periode Januari-Desember 2007 mengindikasikan pola curah hujan yang tidak jauh berbeda dengan rata-rata selama 19 tahun (1988-2006) dengan jeluk hujan kurang dari 100 mm/bulan. Hasil penelitian mengindikasikan bahwa SPL wilayah Indonesia dapat digunakan sebagai indikator untuk menunjukkan kondisi curah hujan di suatu wilayah (kabupaten), artinya curah hujan dapat diprediksi berdasarkan perubahan SPL pada zona-zona dengan korelasi yang tertinggi pada setiap bulannya.</p>


Algorithms ◽  
2021 ◽  
Vol 14 (10) ◽  
pp. 299
Author(s):  
Jianguo Zheng ◽  
Yilin Wang ◽  
Shihan Li ◽  
Hancong Chen

Accurate stock market prediction models can provide investors with convenient tools to make better data-based decisions and judgments. Moreover, retail investors and institutional investors could reduce their investment risk by selecting the optimal stock index with the help of these models. Predicting stock index price is one of the most effective tools for risk management and portfolio diversification. The continuous improvement of the accuracy of stock index price forecasts can promote the improvement and maturity of China’s capital market supervision and investment. It is also an important guarantee for China to further accelerate structural reforms and manufacturing transformation and upgrading. In response to this problem, this paper introduces the bat algorithm to optimize the three free parameters of the SVR machine learning model, constructs the BA-SVR hybrid model, and forecasts the closing prices of 18 stock indexes in Chinese stock market. The total sample comes from 15 January 2016 (the 10th trading day in 2016) to 31 December 2020. We select the last 20, 60, and 250 days of whole sample data as test sets for short-term, mid-term, and long-term forecast, respectively. The empirical results show that the BA-SVR model outperforms the polynomial kernel SVR model and sigmoid kernel SVR model without optimized initial parameters. In the robustness test part, we use the stationary time series data after the first-order difference of six selected characteristics to re-predict. Compared with the random forest model and ANN model, the prediction performance of the BA-SVR model is still significant. This paper also provides a new perspective on the methods of stock index forecasting and the application of bat algorithms in the financial field.


2021 ◽  
Vol 23 (2) ◽  
pp. 194-199
Author(s):  
K.ELANGO ◽  
S. JEYARAJAN NELSON ◽  
P.DINESHKUMAR

The rugose spiraling whitefly (RSW), Aleurodicus rugioperculatus Martin is a new invasive pest occurring in several crops including coconut since 2016 in India from Tamil Nadu, Karnataka, Kerala and Andhra Pradesh. The population dynamics of new invasive whitefly species, A. rugioperculatus study indicated that RSW was found throughout the year on coconut and the observation recorded on weekly interval basis shows that A. rugioperculatus population escalated from the first week of July 2018 (130.8 nymph/ leaf/ frond) reaching the maximum during the first week of October (161.0 nymph/ leaf/ frond) which subsequently dwindled to a minimum during April. Due to variation in the agro-climatic conditions of different regions, arthropods show varying trends in their incidence also in nature and extent of damage to the crop. Influence of weather parameters on rugose spiralling whitefly incidence is lacking, which is essential for developing management strategies. The forecasting model to predict rugose spiralling whitefly incidence in coconut was developed by ARIMAX model of weekly cases and weather factors. In exploring different prediction models by fitting covariates to the time series data, ARIMA (0,2,1) with Maximum temperature was found best model for predicting the rugose  spiralling whitefly incidence and all covariates were found non-significant predictors except maximum temperature.


Computers ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 99
Author(s):  
Sultan Daud Khan ◽  
Louai Alarabi ◽  
Saleh Basalamah

COVID-19 caused the largest economic recession in the history by placing more than one third of world’s population in lockdown. The prolonged restrictions on economic and business activities caused huge economic turmoil that significantly affected the financial markets. To ease the growing pressure on the economy, scientists proposed intermittent lockdowns commonly known as “smart lockdowns”. Under smart lockdown, areas that contain infected clusters of population, namely hotspots, are placed on lockdown, while economic activities are allowed to operate in un-infected areas. In this study, we proposed a novel deep learning prediction framework for the accurate prediction of hotpots. We exploit the benefits of two deep learning models, i.e., Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) and propose a hybrid framework that has the ability to extract multi time-scale features from convolutional layers of CNN. The multi time-scale features are then concatenated and provide as input to 2-layers LSTM model. The LSTM model identifies short, medium and long-term dependencies by learning the representation of time-series data. We perform a series of experiments and compare the proposed framework with other state-of-the-art statistical and machine learning based prediction models. From the experimental results, we demonstrate that the proposed framework beats other existing methods with a clear margin.


Author(s):  
Hesham A. Ali ◽  
Shiraz D. Tayabji

Previous studies have shown that the performance of in-service pavements may deviate significantly from that predicted by use of laboratory-calibrated performance models. Therefore calibration of performance prediction models with data from in-service pavements is important. Calibration of mechanistic rutting models by use of transverse profile data is explored. A well-known family of mechanistic rutting prediction models uses plastic deformation parameters [slope of elastic or plastic strain (or both) and load hardening factor] for quantification of the amount of permanent deformation resulting from each load application. For the purpose of obtaining these parameters, two traditional methods have been used: repeated load testing in the laboratory and calibration by use of time-series data from in-service pavements. Although the first suffers from the lack of compatibility between laboratory-predicted and actual performance, the second requires collection of field data for an extended period of time (years of monitoring) and may be interrupted by rehabilitation activities. The transverse profile contains valuable information that can be used for determining the contribution of each pavement layer to the observed rutting and the plastic deformation parameters. Transverse profile data were used for calibration of rutting prediction models. The stability and sensitivity of the computed parameters were also investigated.


2016 ◽  
Vol 23 (3) ◽  
pp. 302-322 ◽  
Author(s):  
Ka Chi Lam ◽  
Olalekan Shamsideen Oshodi

Purpose – Fluctuations in construction output has an adverse effect on the construction industry and the economy due to its strong linkage. Developing reliable and accurate predictive models is vital to implementing effective response strategies to mitigate the impact of such fluctuations. The purpose of this paper is to compare the accuracy of two univariate forecast models, i.e. Box-Jenkins (autoregressive integrated moving average (ARIMA)) and Neural Network Autoregressive (NNAR). Design/methodology/approach – Four quarterly time-series data on the construction output of Hong Kong were collected (1983Q1-2014Q4). The collected data were divided into two parts. The first part was fitted to the model, while the other was used to evaluate the predictive accuracy of the developed models. Findings – The NNAR model can provide reliable and accurate forecast of total, private and “others” construction output for the medium term. In addition, the NNAR model outperforms the ARIMA model, in terms of accuracy. Research limitations/implications – The applicability of the NNAR model to the construction industry of other countries could be further explored. The main limitation of artificial intelligence models is the lack of explanatory capability. Practical implications – The NNAR model could be used as a tool for accurately predicting future patterns in construction output. This is vital for the sustained growth of the construction industry and the economy. Originality/value – This is the first study to apply the NNAR model to construction output forecasting research.


2021 ◽  
Vol 14 (1) ◽  
pp. 140
Author(s):  
Johann Desloires ◽  
Dino Ienco ◽  
Antoine Botrel ◽  
Nicolas Ranc

Applications in which researchers aim to extract a single land type from remotely sensed data are quite common in practical scenarios: extract the urban footprint to make connections with socio-economic factors; map the forest extent to subsequently retrieve biophysical variables and detect a particular crop type to successively calibrate and deploy yield prediction models. In this scenario, the (positive) targeted class is well defined, while the negative class is difficult to describe. This one-class classification setting is also referred to as positive unlabelled learning (PUL) in the general field of machine learning. To deal with this challenging setting, when satellite image time series data are available, we propose a new framework named positive and unlabelled learning of satellite image time series (PUL-SITS). PUL-SITS involves two different stages: In the first one, a recurrent neural network autoencoder is trained to reconstruct only positive samples with the aim to higight reliable negative ones. In the second stage, both labelled and unlabelled samples are exploited in a semi-supervised manner to build the final binary classification model. To assess the quality of our approach, experiments were carried out on a real-world benchmark, namely Haute-Garonne, located in the southwest area of France. From this study site, we considered two different scenarios: a first one in which the process has the objective to map Cereals/Oilseeds cover versus the rest of the land cover classes and a second one in which the class of interest is the Forest land cover. The evaluation was carried out by comparing the proposed approach with recent competitors to deal with the considered positive and unlabelled learning scenarios.


2019 ◽  
Author(s):  
Aaron Jason Fisher ◽  
Peter D. Soyster

The present study sought to apply statistical classification methods to idiographic time series data in order to make accurate future predictions of behavior. We recruited 70 individuals who presented as regular smokers; 52 completed experience sampling method (ESM) data collection and provided sufficient time series data. Time stamps from ESM surveys were used to calculate the time of day, day of the week, and continuous time—where the last datum was, in turn, used to calculate 12-hr and 24-hr cycles. Each individual’s time series was split into sequential training and testing sections, so that trained models could be tested on future observations. Prediction models were trained on the first 75% of the individual’s data and tested on the last 25%. Predictions of future behavior were made on a person by person basis. Two prediction algorithms were employed, elastic net regularization and naïve Bayes classification. Sample-wide area under the curve was nearly 80%, with some models demonstrating perfect prediction accuracies. Sensitivity and specificity were between 0.78 and 0.81 across the two approaches. Importantly, prediction models were based on a lagged data structure. Thus, in addition to supporting the prediction accuracy of our models with out-of-sample tests in time-forward data, the models themselves were time-lagged, such that each prediction was for the subsequent measurement. Such a system could be the basis for mobile, just-in-time interventions for substance use, as models that accurately predict future behavior could ostensibly be used for delivering personalized interventions at empirically-indicated moments of need.


Sign in / Sign up

Export Citation Format

Share Document