scholarly journals Designing a Profit and Loss Prediction Model for Health Companies Using Data Mining

2021 ◽  
Vol 10 (1) ◽  
pp. 94
Author(s):  
Ali Abdolahi ◽  
Vali Nowzari ◽  
Ali Pirzad ◽  
Seyed Ehsan Amirhosseini

Introduction: Health companies need investment for development. Due to the high risk of their activities, it is very difficult to attract investment for this field, but this lack of financial resources leads to the failure of these companies, so providing a model for predicting profits and losses in companies is very important and functional.Materials and Method: In this study, a combination of two logistic regression algorithms and differential analysis were used to design a profit and loss forecasting model. Also, the information of 20 companies in the field of health was used to evaluate the proposed model. 10 profitable companies and 10 loss-making companies were selected and for each company, nine variables independent of the financial information of these companies were collected.Results: The designed prediction model was implemented on the data in this study. To do this, the data were divided into two sets: training and testing. The prediction model was implemented on training data and evaluated by test data and reached 99.65% sensitivity, 94.75% specificity and 96.28% accuracy. The proposed model was then compared with the methods of decision tree C4.5, Bayesian, support vector machine, nearest neighborhood and multilayer neural network and it was found to have a better output.Conclusion: In this study, it was found that the risk in the field of health investment can be reduced, so the profit and loss situation of health companies can be predicted with appropriate accuracy. It was also found that the combination of logistic regression and differential analysis algorithms can increase the accuracy of the prediction model.

Author(s):  
Bowen Gao ◽  
Dongxiu Ou ◽  
Decun Dong ◽  
Yusen Wu

Accurate prediction of train delay recovery is critical for railway incident management and providing passengers with accurate journey time. In this paper, a two-stage prediction model is proposed to predict the recovery time of train primary-delay based on the real records from High-Speed Railway (HSR). In Stage 1, two models are built to study the influence of feature space and model framework on the prediction accuracy of buffer time in each section or station. It is found that explicitly inputting the attribute features of stations and sections to the model, instead of implicit simulation, will improve the prediction accuracy effectively. For validation purpose, the proposed model has been compared with several alternative models, namely, Logistic Regression (LR), Artificial Neutral Network (ANN), Support Vector Machine (SVM) and Gradient Boosting Tree (GBT). The results show that its remarkable performance is better than other schemes. Specifically, when the error is extended to 3[Formula: see text]min, the proposed model can achieve up to the accuracy of 94.63%. It proves that our method has high value in practical engineering application. Considering the delay propagation of trains is a complex process, our future study will focus on building delay propagation knowledge base and dispatcher experience knowledge base.


Electronics ◽  
2019 ◽  
Vol 8 (7) ◽  
pp. 743 ◽  
Author(s):  
Alice Stazio ◽  
Juan G. Victores ◽  
David Estevez ◽  
Carlos Balaguer

The examination of Personal Protective Equipment (PPE) to assure the complete integrity of health personnel in contact with infected patients is one of the most necessary tasks when treating patients affected by infectious diseases, such as Ebola. This work focuses on the study of machine vision techniques for the detection of possible defects on the PPE that could arise after contact with the aforementioned pathological patients. A preliminary study on the use of image classification algorithms to identify blood stains on PPE subsequent to the treatment of the infected patient is presented. To produce training data for these algorithms, a synthetic dataset was generated from a simulated model of a PPE suit with blood stains. Furthermore, the study proceeded with the utilization of images of the PPE with a physical emulation of blood stains, taken by a real prototype. The dataset reveals a great imbalance between positive and negative samples; therefore, all the selected classification algorithms are able to manage this kind of data. Classifiers range from Logistic Regression and Support Vector Machines, to bagging and boosting techniques such as Random Forest, Adaptive Boosting, Gradient Boosting and eXtreme Gradient Boosting. All these algorithms were evaluated on accuracy, precision, recall and F 1 score; and additionally, execution times were considered. The obtained results report promising outcomes of all the classifiers, and, in particular Logistic Regression resulted to be the most suitable classification algorithm in terms of F 1 score and execution time, considering both datasets.


Author(s):  
M. Zhou ◽  
C. R. Li ◽  
L. Ma ◽  
H. C. Guan

In this study, a land cover classification method based on multi-class Support Vector Machines (SVM) is presented to predict the types of land cover in Miyun area. The obtained backscattered full-waveforms were processed following a workflow of waveform pre-processing, waveform decomposition and feature extraction. The extracted features, which consist of distance, intensity, Full Width at Half Maximum (FWHM) and back scattering cross-section, were corrected and used as attributes for training data to generate the SVM prediction model. The SVM prediction model was applied to predict the types of land cover in Miyun area as ground, trees, buildings and farmland. The classification results of these four types of land covers were obtained based on the ground truth information according to the CCD image data of Miyun area. It showed that the proposed classification algorithm achieved an overall classification accuracy of 90.63%. In order to better explain the SVM classification results, the classification results of SVM method were compared with that of Artificial Neural Networks (ANNs) method and it showed that SVM method could achieve better classification results.


2020 ◽  
Vol 32 (2) ◽  
Author(s):  
Oluwafemi Oriola ◽  
Eduan Kotzé

Semi-supervised learning is a potential solution for improving training data in low-resourced abusive language detection contexts such as South African abusive language detection on Twitter. However, the existing semi-supervised learning methods have been skewed towards small amounts of labelled data, with small feature space. This paper, therefore, presents a semi-supervised learning technique that improves the distribution of training data by assigning labels to unlabelled data based on the majority voting over different feature sets of labelled and unlabelled data clusters. The technique is applied to South African English corpora consisting of labelled and unlabelled abusive tweets. The proposed technique is compared with state-of-the-art self-learning and active learning techniques based on syntactic and semantic features. The performance of these techniques with Logistic Regression, Support Vector Machine and Neural Networks are evaluated. The proposed technique, with accuracy and F1-score of 0.97 and 0.95, respectively, outperforms existing semi-supervised learning techniques. The learning curves show that the training data was used more efficiently by the proposed technique compared to existing techniques. Overall, n-gram syntactic features with a Logistic Regression classifier records the highest performance. The paper concludes that the proposed semi-supervised learning technique effectively detected implicit and explicit South African abusive language on Twitter.


Flooding is a major problem globally, and especially in SuratThani province, Thailand. Along the lower Tapeeriver in SuratThani, the population density is high. Implementing an early warning system can benefit people living along the banks here. In this study, our aim was to build a flood prediction model using artificial neural network (ANN), which would utilize water and stream levels along the lower Tapeeriver to predict floods. This model was used to predict flood using a dataset of rainfall and stream levels measured at local stations. The developed flood prediction model consisted of 4 input variables, namely, the rainfall amounts and stream levels at stations located in the PhraSeang district (X.37A), the Khian Sa district (X.217), and in the Phunphin district (X.5C). Model performance was evaluated using input data spanning a period of eight years (2011–2018). The model performance was compared with support vector machine (SVM), and ANN had better accuracy. The results showed an accuracy of 97.91% for the ANN model; however, for SVM it was 97.54%. Furthermore, the recall (42.78%) and f-measure (52.24%) were better for our model, however, the precision was lower. Therefore, the designed flood prediction model can estimate the likelihood of floods around the lower Tapee river region


2021 ◽  
Author(s):  
Jincheng Yang

BACKGROUND Diabetes mellitus and cancer are amongst the leading causes of deaths worldwide; hyperglycemia plays a major contributory role in neoplastic transformation risk. Support Vector Machine (SVM) is a type of supervised learning method which analyzes data and recognizes patterns, mainly used for statistical classification and regression. OBJECTIVE From reported adverse events of PD-1 or PD-L1 (programmed death 1 or ligand 1) inhibitors in post-marketing monitoring, we aimed to construct an effective machine learning algorithm to predict the probability of hyperglycemic adverse reaction from PD-1/PD-L1 inhibitors treated patients efficiently and rapidly. METHODS Raw data was downloaded from US Food and Drug Administration Adverse Event Reporting System (FDA FAERS). Signal of relationship between drug and adverse reaction based on disproportionality analysis and Bayesian analysis. A multivariate pattern classification of SVM was used to construct classifier to separate adverse hyperglycemic reaction patients. A 10-fold-3-time cross validation for model setup within training data (80% data) output best parameter values in SVM within R software. The model was validated in each testing data (20% data) and two total drug data, with exactly predictor parameter variables: gamma and nu. RESULTS Total 95918 case files were downloaded from 7 relevant drugs (cemiplimab, avelumab, durvalumab, atezolizumab, pembrolizumab, ipilimumab, nivolumab). The number-type/number-optimization method was selected to optimize model. Both gamma and nu values correlated with case number showed high adjusted r2 in curve regressions (both r2 >0.95). Indexes of accuracy, F1 score, kappa and sensitivity were greatly improved from the prediction model in training data and two total drug data. CONCLUSIONS The SVM prediction model established here can non-invasively and precisely predict occurrence of hyperglycemic adverse drug reaction (ADR) in PD-1/PD-L1 inhibitors treated patients. Such information is vital to overcome ADR and to improve outcomes by distinguish high hyperglycemia-risk patients, and this machine learning algorithm can eventually add value onto clinical decision making. CLINICALTRIAL N/A


2020 ◽  
Author(s):  
Jincheng Yang ◽  
Weilong Lin ◽  
Liming Shi ◽  
Ming Deng ◽  
Wenjing Yang

Abstract Background: Diabetes mellitus and cancer are amongst the leading causes of deaths worldwide; hyperglycemia plays a major contributory role in neoplastic transformation risk. From reported adverse events of PD-1 or PD-L1 (programmed death 1 or ligand 1) inhibitors in post-marketing monitoring, we aimed to construct an effective machine learning algorithm to predict the probability of hyperglycemic adverse reaction from PD-1/PD-L1 inhibitors treated patients efficiently and rapidly. Methods: Raw data was downloaded from US Food and Drug Administration Adverse Event Reporting System (FDA FAERS). Signal of relationship between drug and adverse reaction based on disproportionality analysis and Bayesian analysis. A multivariate pattern classification of Support Vector Machine (SVM) was used to construct classifier to separate adverse hyperglycemic reaction patients. A 10-fold-3-time cross validation for model setup within training data (80% data) output best parameter values in SVM within R software. The model was validated in each testing data (20% data) and two total drug data, with exactly predictor parameter variables: gamma and nu. Results: Total 95918 case files were downloaded from 7 relevant drugs (cemiplimab, avelumab, durvalumab, atezolizumab, pembrolizumab, ipilimumab, nivolumab). The number-type/number-optimization method was selected to optimize model. Both gamma and nu values correlated with case number showed high adjusted r2 in curve regressions (both r2 >0.95). Indexes of accuracy, F1 score, kappa and sensitivity were greatly improved from the prediction model in training data and two total drug data. Conclusions: The SVM prediction model established here can non-invasively and precisely predict occurrence of hyperglycemic adverse drug reaction (ADR) in PD-1/PD-L1 inhibitors treated patients. Such information is vital to overcome ADR and to improve outcomes by distinguish high hyperglycemia-risk patients, and this machine learning algorithm can eventually add value onto clinical decision making.


2012 ◽  
Vol 542-543 ◽  
pp. 507-512 ◽  
Author(s):  
Xiaoping Zhang ◽  
Jun Zhao

The output prediction of blast furnace gas (BFG), influenced by many complex production factors, is a very important and difficult problem concerning the byproduct gas balance in steel industry. A new online least squares support vector machine (LSSVM) prediction model is proposed in this paper, in which the training data is filtered by an improved empirical mode decomposition threshold filtering (IEMDTF). The model is solved and optimized by an online learning algorithm and an online bayesian parameters optimization, respectively. The experimental results using practical BFG output data from BaoSteel Co. Ltd., China show the proposed model is effective and enable to offer reasonable gas balance scheduling for operators.


Sign in / Sign up

Export Citation Format

Share Document