scholarly journals Interrelated Decision-Making Model for Diabetes

2021 ◽  
Vol 10 (02) ◽  
pp. 170-186
Author(s):  
Normadiah Mahiddin ◽  
Zulaiha Ali Othman ◽  
Nur Arzuar Abdul Rahim

Diabetes is one of the growing chronic diseases. Proper treatment is needed to produce its effects. Past studies have proposed an Interrelated Decision-making Model (IDM) as an intelligent decision support system (IDSS) solution for healthcare. This model can provide accurate results in determining the treatment of a particular patient. Therefore, the purpose of this study is to develop a diabetic IDM to see the increased decision-making accuracy with the IDM concept. The IDM concept allows the amount of data to increase with the addition of data records at the same level of care, and the addition of data records and attributes from the previous or subsequent levels of care. The more data or information, the more accurate a decision can be made. Data were developed to make diagnostic predictions for each stage of care in the development of type 2 diabetes. The development of data for each stage of care was confirmed by specialists. However, the experiments were performed using simulation data for two stages of care only. Four data sets of different sizes were provided to view changes in forecast accuracy. Each data set contained 2 data sets of primary care level and secondary care level with 4 times the change of the number of attributes from 25 to 58 and the number of records from 300 to 11,000. Data were developed to predict the level of diabetes confirmed by specialist doctors. The experimental results showed that on average, the J48 algorithm showed the best model (99%) followed by Logistics (98%), RandomTree (95%), NaiveBayes Updateable (93%), BayesNet (84%) and AdaBoostM1 (67%). Ratio analysis also showed that the accuracy of the forecast model has increased up to 49%. The MAPKB model for the care of diabetes is designed with data change criteria dynamically and is able to develop the latest dynamic prediction models effectively.v

2015 ◽  
Vol 17 (5) ◽  
pp. 719-732
Author(s):  
Dulakshi Santhusitha Kumari Karunasingha ◽  
Shie-Yui Liong

A simple clustering method is proposed for extracting representative subsets from lengthy data sets. The main purpose of the extracted subset of data is to use it to build prediction models (of the form of approximating functional relationships) instead of using the entire large data set. Such smaller subsets of data are often required in exploratory analysis stages of studies that involve resource consuming investigations. A few recent studies have used a subtractive clustering method (SCM) for such data extraction, in the absence of clustering methods for function approximation. SCM, however, requires several parameters to be specified. This study proposes a clustering method, which requires only a single parameter to be specified, yet it is shown to be as effective as the SCM. A method to find suitable values for the parameter is also proposed. Due to having only a single parameter, using the proposed clustering method is shown to be orders of magnitudes more efficient than using SCM. The effectiveness of the proposed method is demonstrated on phase space prediction of three univariate time series and prediction of two multivariate data sets. Some drawbacks of SCM when applied for data extraction are identified, and the proposed method is shown to be a solution for them.


2011 ◽  
Vol 50-51 ◽  
pp. 885-889 ◽  
Author(s):  
Fei Xue Yan ◽  
Jing Xia ◽  
Guan Qun Shen ◽  
Xu Sheng Kang

As time goes by, hazard rate of the society would increase if crime prediction was not implemented. Based on objective factors of offenders and victims characteristics, AHP method can be established to get a quantitative and qualitative analysis on crime prediction. Crime prediction is a strategic and tactical measure for crime prevention. According to AHP analysis, two prediction models of the optimal predictive crime locations are put forward. Standard Deviational Ellipses Model and Key Feature adjusted Spatial Choice Model were formulated to account for the anticipated position with various elements from AHP method. These models could be applied in a computer simulation of situation tests of the series murders. Besides, applying those models in certain real case demonstrates how the models work. Through models comparison, the results are summarized that Key Feature adjusted Spatial Choice Model is more conducive in confirming the guilty place. In conclusion, the suggested models, including detailed criminal map, are easy to implement.


2020 ◽  
Author(s):  
Tianyu Xu ◽  
Yongchuan Yu ◽  
Jianzhuo Yan ◽  
Hongxia Xu

Abstract Due to the problems of unbalanced data sets and distribution differences in long-term rainfall prediction, the current rainfall prediction model had poor generalization performance and could not achieve good prediction results in real scenarios. This study uses multiple atmospheric parameters (such as temperature, humidity, atmospheric pressure, etc.) to establish a TabNet-LightGbm rainfall probability prediction model. This research uses feature engineering (such as generating descriptive statistical features, feature fusion) to improve model accuracy, Borderline Smote algorithm to improve data set imbalance, and confrontation verification to improve distribution differences. The experiment uses 5 years of precipitation data from 26 stations in the Beijing-Tianjin-Hebei region of China to verify the proposed rainfall prediction model. The test set is to predict the rainfall of each station in one month. The experimental results shows that the model has good performance with AUC larger than 92%. The method proposed in this study further improves the accuracy of rainfall prediction, and provides a reference for data mining tasks.


10.29007/rh9l ◽  
2019 ◽  
Author(s):  
Cuauhtémoc López-Martín

Defect density (DD) is a measure to determine the effectiveness of software processes. DD is defined as the total number of defects divided by the size of the software. Software prediction is an activity of software planning. This study is related to the analysis of attributes of data sets commonly used for building DD prediction models. The data sets of software projects were selected from the International Software Benchmarking Standards Group (ISBSG) Release 2018. The selection criteria were based on attributes such as type of development, development platform, and programming language generation as suggested by the ISBSG. Since a lower size of data set is generated as mentioned criteria are observed, it avoids a good generalization for models. Therefore, in this study, a statistical analysis of data sets was performed with the objective of knowing if they could be pooled instead of using them as separated data sets. Results showed that there was no difference among the DD of new projects nor among the DD of enhancement projects, but there was a difference between the DD of new and enhancement projects. Results suggest that prediction models can separately be constructed for new projects and enhancement projects, but not by pooling new and enhancement ones.


Author(s):  
Xiaojie Xu

AbstractWe examine the short-run forecasting problem in a data set of daily prices from 134 corn buying locations from seven states – Iowa, Illinois, Indiana, Ohio, Minnesota, Nebraska, and Kansas. We ask the question: is there useful forecasting information in the cash bids from nearby markets? We use several criteria, including a Granger causality criterion, to specify forecast models that rely on the recent history of a market, the recent histories of nearby markets, and the recent histories of futures prices. For about 65% of the markets studied, the model consisting of futures prices, a market’s own history, and the history of nearby markets forecasts better than a model only incorporating futures prices and the market’s own history. That is, nearby markets have predictive content. But the magnitude varies with the forecast horizon. For short-run forecasts, the forecast accuracy improvement from including nearby markets is modest. As the forecast horizon increases, however, including nearby prices tends to significantly improve forecasts. We also examine the role played by physical market density in determining the value of incorporating nearby prices into a forecast model.


2021 ◽  
Vol 143 (11) ◽  
Author(s):  
Mohsen Faramarzi-Palangar ◽  
Behnam Sedaee ◽  
Mohammad Emami Niri

Abstract The correct definition of rock types plays a critical role in reservoir characterization, simulation, and field development planning. In this study, we use the critical pore size (linf) as an approach for reservoir rock typing. Two linf relations were separately derived based on two permeability prediction models and then merged together to drive a generalized linf relation. The proposed rock typing methodology includes two main parts: in the first part, we determine an appropriate constant coefficient, and in the second part, we perform reservoir rock typing based on two different scenarios. The first scenario is based on the forming groups of rocks using statistical analysis, and the second scenario is based on the forming groups of rocks with similar capillary pressure curves. This approach was applied to three data sets. In detail, two data sets were used to determine the constant coefficient, and one data set was used to show the applicability of the linf method in comparison with FZI for rock typing.


2020 ◽  
Vol 10 (8) ◽  
pp. 2725-2739 ◽  
Author(s):  
Diego Jarquin ◽  
Reka Howard ◽  
Jose Crossa ◽  
Yoseph Beyene ◽  
Manje Gowda ◽  
...  

“Sparse testing” refers to reduced multi-environment breeding trials in which not all genotypes of interest are grown in each environment. Using genomic-enabled prediction and a model embracing genotype × environment interaction (GE), the non-observed genotype-in-environment combinations can be predicted. Consequently, the overall costs can be reduced and the testing capacities can be increased. The accuracy of predicting the unobserved data depends on different factors including (1) how many genotypes overlap between environments, (2) in how many environments each genotype is grown, and (3) which prediction method is used. In this research, we studied the predictive ability obtained when using a fixed number of plots and different sparse testing designs. The considered designs included the extreme cases of (1) no overlap of genotypes between environments, and (2) complete overlap of the genotypes between environments. In the latter case, the prediction set fully consists of genotypes that have not been tested at all. Moreover, we gradually go from one extreme to the other considering (3) intermediates between the two previous cases with varying numbers of different or non-overlapping (NO)/overlapping (O) genotypes. The empirical study is built upon two different maize hybrid data sets consisting of different genotypes crossed to two different testers (T1 and T2) and each data set was analyzed separately. For each set, phenotypic records on yield from three different environments are available. Three different prediction models were implemented, two main effects models (M1 and M2), and a model (M3) including GE. The results showed that the genome-based model including GE (M3) captured more phenotypic variation than the models that did not include this component. Also, M3 provided higher prediction accuracy than models M1 and M2 for the different allocation scenarios. Reducing the size of the calibration sets decreased the prediction accuracy under all allocation designs with M3 being the less affected model; however, using the genome-enabled models (i.e., M2 and M3) the predictive ability is recovered when more genotypes are tested across environments. Our results indicate that a substantial part of the testing resources can be saved when using genome-based models including GE for optimizing sparse testing designs.


Author(s):  
Onur Doğan ◽  
Hakan  Aşan ◽  
Ejder Ayç

In today’s competitive world, organizations need to make the right decisions to prolong their existence. Using non-scientific methods and making emotional decisions gave way to the use of scientific methods in the decision making process in this competitive area. Within this scope, many decision support models are still being developed in order to assist the decision makers and owners of organizations. It is easy to collect massive amount of data for organizations, but generally the problem is using this data to achieve economic advances. There is a critical need for specialization and automation to transform the data into the knowledge in big data sets. Data mining techniques are capable of providing description, estimation, prediction, classification, clustering, and association. Recently, many data mining techniques have been developed in order to find hidden patterns and relations in big data sets. It is important to obtain new correlations, patterns, and trends, which are understandable and useful to the decision makers. There have been many researches and applications focusing on different data mining techniques and methodologies.In this study, we aim to obtain understandable and applicable results from a large volume of record set that belong to a firm, which is active in the meat processing industry, by using data mining techniques. In the application part, firstly, data cleaning and data integration, which are the first steps of data mining process, are performed on the data in the database. With the aid of data cleaning and data integration, the data set was obtained, which is suitable for data mining. Then, various association rule algorithms were applied to this data set. This analysis revealed that finding unexplored patterns in the set of data would be beneficial for the decision makers of the firm. Finally, many association rules are obtained, which are useful for decision makers of the local firm. 


2021 ◽  
Author(s):  
Alessandra Toniato ◽  
Philippe Schwaller ◽  
Antonio Cardinale ◽  
Joppe Geluykens ◽  
Teodoro Laino

<p>Existing deep learning models applied to reaction prediction in organic chemistry can reach high levels of accuracy (> 90% for Natural Language Processing-based ones).</p><p>With no chemical knowledge embedded than the information learnt from reaction data, the quality of the data sets plays a crucial role in the performance of the prediction models. While human curation is prohibitively expensive, the need for unaided approaches to remove chemically incorrect entries from existing data sets is essential to improve artificial intelligence models' performance in synthetic chemistry tasks. Here we propose a machine learning-based, unassisted approach to remove chemically wrong entries from chemical reaction collections. We applied this method to the collection of chemical reactions Pistachio and to an open data set, both extracted from USPTO (United States Patent Office) patents. Our results show an improved prediction quality for models trained on the cleaned and balanced data sets. For the retrosynthetic models, the round-trip accuracy metric grows by 13 percentage points and the value of</p><p>the cumulative Jensen Shannon divergence decreases by 30% compared to its original record. The coverage remains high with 97%, and the value of the class-diversity is not affected by the cleaning. The proposed strategy is the first unassisted rule-free technique to address automatic noise reduction in chemical data sets.</p>


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Dipendra Jha ◽  
Kamal Choudhary ◽  
Francesca Tavazza ◽  
Wei-keng Liao ◽  
Alok Choudhary ◽  
...  

AbstractThe current predictive modeling techniques applied to Density Functional Theory (DFT) computations have helped accelerate the process of materials discovery by providing significantly faster methods to scan materials candidates, thereby reducing the search space for future DFT computations and experiments. However, in addition to prediction error against DFT-computed properties, such predictive models also inherit the DFT-computation discrepancies against experimentally measured properties. To address this challenge, we demonstrate that using deep transfer learning, existing large DFT-computational data sets (such as the Open Quantum Materials Database (OQMD)) can be leveraged together with other smaller DFT-computed data sets as well as available experimental observations to build robust prediction models. We build a highly accurate model for predicting formation energy of materials from their compositions; using an experimental data set of $$1,643$$1,643 observations, the proposed approach yields a mean absolute error (MAE) of $$0.07$$0.07 eV/atom, which is significantly better than existing machine learning (ML) prediction modeling based on DFT computations and is comparable to the MAE of DFT-computation itself.


Sign in / Sign up

Export Citation Format

Share Document