A Model for Recognizing Key Factors and Applications Thereof to Engineering

This paper presents an approach to recognize key factors in data classification. Using collinearity diagnostics to delete the factors of repeated information and Logistic regression significant discriminant to select the factors which can effectively distinguish the two kinds of samples, this paper creates a model for recognizing key factors. The proposed model is demonstrated by using the 2044 observations in finical engineering. The experimental results demonstrate that the 13 indicators such as “marital status,” “net income of borrower,” and “Engel's coefficient” are the key factors to distinguish the good customers from the bad customers. By analyzing the experimental results, the performance of the proposed model is verified. Moreover, the proposed method is simple and easy to be implemented.

Download Full-text

The Application of Multiple Classifier System for Environmental Audio Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.462-463.225 ◽

2013 ◽

Vol 462-463 ◽

pp. 225-229 ◽

Cited By ~ 3

Author(s):

Yan Zhang ◽

Dan Jv Lv ◽

Hong Song Wang

Keyword(s):

Random Forest ◽

Data Classification ◽

Experimental Results ◽

Audio Classification ◽

Key Factors ◽

Multiple Classifier System ◽

Classifier System ◽

Multiple Classifiers ◽

Multiple Classifier ◽

Audio Data

Multiple classifier system trains different classifiers and combines their predictions to improve the accuracy of classification. This paper explains the popular algorithms and strategies in multiple classifier system, and points out the key factors to affect the performance of the application of multiple classifier system. The experiments are carried out on given environmental audio data in order to compare the singular classifier methods with multiple classifier system such as Random Forest and MCS, as well as Bagging and AdaBoost. The experimental results show that the multiple classifiers technology outperforms the singular classifier and obtains better performance in environmental audio data classification. It provides an effective way to guarantee the performance and generalization of classification.

Download Full-text

A Novel Imbalanced Data Classification Approach Based on Logistic Regression and Fisher Discriminant

Mathematical Problems in Engineering ◽

10.1155/2015/945359 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 9

Author(s):

Baofeng Shi ◽

Jing Wang ◽

Junyan Qi ◽

Yanqiu Cheng

Keyword(s):

Logistic Regression ◽

Imbalanced Data ◽

Data Classification ◽

Classification Approach ◽

Fisher Discriminant ◽

Proposed Model ◽

Imbalanced Data Classification ◽

Customer Classification ◽

Key Indicators ◽

Rating Model

We introduce an imbalanced data classification approach based on logistic regression significant discriminant and Fisher discriminant. First of all, a key indicators extraction model based on logistic regression significant discriminant and correlation analysis is derived to extract features for customer classification. Secondly, on the basis of the linear weighted utilizing Fisher discriminant, a customer scoring model is established. And then, a customer rating model where the customer number of all ratings follows normal distribution is constructed. The performance of the proposed model and the classical SVM classification method are evaluated in terms of their ability to correctly classify consumers as default customer or nondefault customer. Empirical results using the data of 2157 customers in financial engineering suggest that the proposed approach better performance than the SVM model in dealing with imbalanced data classification. Moreover, our approach contributes to locating the qualified customers for the banks and the bond investors.

Download Full-text

Connected-Tube MPP Model for Unsupervised 3D Fiber Detection

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.14.coimg-305 ◽

2020 ◽

Vol 2020 (14) ◽

pp. 305-1-305-6

Author(s):

Tianyu Li ◽

Camilo G. Aguilar ◽

Ronald F. Agyei ◽

Imad A. Hanhan ◽

Michael D. Sangid ◽

...

Keyword(s):

Composite Material ◽

Point Process ◽

Marked Point ◽

Experimental Results ◽

Marked Point Process ◽

Fiber Reinforced Composite ◽

Reinforced Composite ◽

Proposed Model ◽

Cylinder Model ◽

Reinforced Composite Material

In this paper, we extend our previous 2D connected-tube marked point process (MPP) model to a 3D connected-tube MPP model for fiber detection. In the 3D case, a tube is represented by a cylinder model with two spherical areas at its ends. The spherical area is used to define connection priors that encourage connection of tubes that belong to the same fiber. Since each long fiber can be fitted by a series of connected short tubes, the proposed model is capable of detecting curved long tubes. We present experimental results on fiber-reinforced composite material images to show the performance of our method.

Download Full-text

Regression model of assessment of customer solvency and banking risks in the process of lending

Socio-Economic Problems of the Modern Period of Ukraine ◽

10.36818/2071-4653-2019-4-11 ◽

2019 ◽

pp. 69-73

Author(s):

Zoryna Yurynets ◽

Rostyslav Yurynets ◽

Nataliya Kunanets ◽

Ivanna Myshchyshyn

Keyword(s):

Logistic Regression ◽

Discriminant Analysis ◽

Regression Model ◽

Likelihood Ratio ◽

Credit Scoring ◽

Bank Loans ◽

Management Decisions ◽

Computer Data ◽

Proposed Model ◽

Loan Risk

In the current conditions of economic development, it is important to pay attention to the study of the main types of risks, effective methods of evaluation, monitoring, analysis of banking risks. One of the main approaches to quantitatively assessing the creditworthiness of borrowers is credit scoring. The objective of credit scoring is to optimize management decisions regarding the possibility of providing bank loans. In the article, the scientific and methodological provisions concerning the formation of a regression model for assessing bank risks in the process of granting loans to borrowers has been proposed. The proposed model is based on the use of logistic regression tools, discriminant analysis with the use of expert evaluation. During the formation of a regression model, the relationship between risk factors and probable magnitude of loan risk has been established. In the course of calculations, the coefficient of the individual's solvency has been calculated. Direct computer data preparation, including the calculation of the indicators selected in the process of discriminant analysis, has been carried out in the Excel package environment, followed by their import into the STATISTICA package for analysis in the “Logistic regression” sub-module of the “Nonlinear evaluation” module. The adequacy of the constructed model has been determined using the Macfaden's likelihood ratio index. The calculated value of the Macfaden's likelihood ratio index indicates the adequacy of the constructed model. The ability to issue loans to new clients has been evaluated using a regression model. The conducted calculations show the possibility of granting a loan exclusively to the second and third clients. The offered method allows to conduct assessment of client's solvency and risk prevention at different stages of lending, facilitates the possibility to independently make informed decisions on credit servicing of clients and management of a loan portfolio, optimization of management decisions in banks. In order for a loan-based model to continue to perform its functions, it must be periodically adjusted.

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text

New Key Factors Discovery to Enhance Dengue Fever Forecasting Model

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.931-932.1457 ◽

2014 ◽

Vol 931-932 ◽

pp. 1457-1461 ◽

Cited By ~ 1

Author(s):

Phatsavee Ongruk ◽

Padet Siriyasatien ◽

Kraisak Kesorn

Keyword(s):

Wind Speed ◽

Dengue Fever ◽

Forecast Model ◽

Key Factors ◽

Dengue Virus Infection ◽

Conventional Model ◽

Forecasting Error ◽

Proposed Model ◽

Basic Set ◽

Almost All

There are several factors that can be used to predict a dengue fever outbreak. Almost all existing research approaches, however, usually exploit the use of a basic set of core attributes to forecast an outbreak, e.g. temperature, humidity, wind speed, and rainfall. In contrast, this research identifies new attributes to improve the prediction accuracy of the outbreak. The experimental results are analyzed using a correlation analysis and demonstrate that the density of dengue virus infection rate in female mosquitoes and seasons have strong correlation with a dengue fever outbreak. In addition, the research constructs a forecast model using Poisson regression analysis. The result shows the proposed model obtains significantly low forecasting error rate when compared it against the conventional model using only temperature, humidity, wind speed, and rainfall parameters.

Download Full-text

Multimedia Quality Integration Using Piecewise Function

Advanced Engineering Forum ◽

10.4028/www.scientific.net/aef.1.375 ◽

2011 ◽

Vol 1 ◽

pp. 375-380

Author(s):

Shu Ai Wan ◽

Kai Fang Yang ◽

Hai Yong Zhou

Keyword(s):

Visual Quality ◽

Experimental Results ◽

Integration Model ◽

Quality Performance ◽

Piecewise Function ◽

Proposed Model ◽

Constant Coefficients ◽

Theoretical Analyses ◽

Different Levels

In this paper the important issue of multimedia quality evaluation is concerned, given the unimodal quality of audio and video. Firstly, the quality integration model recommended in G.1070 is evaluated using experimental results. Theoretical analyses aide empirical observations suggest that the constant coefficients used in the G.1070 model should actually be piecewise adjusted for different levels of audio and visual quality. Then a piecewise function is proposed to perform multimedia quality integration under different levels of the audio and visual quality. Performance gain observed from experimental results substantiates the effectiveness of the proposed model.

Download Full-text

DEPRESSION IN OLDER ADULTS- A STUDY ON PATIENTS VISITING A TERTIARY CARE CENTER IN NORTH INDIA

International Journal of Medical and Biomedical Studies ◽

10.32553/ijmbs.v3i5.238 ◽

2019 ◽

Vol 3 (5) ◽

Author(s):

Indrajeet Singh Gambhir ◽

Amit Raj Sharma ◽

Sankha Shubhra Chakrabarti ◽

Upinder Kaur ◽

Bindu Prakash

Keyword(s):

Logistic Regression ◽

Marital Status ◽

Short Form ◽

Care Center ◽

Univariate Analysis ◽

Tertiary Care ◽

Depression Score ◽

North India ◽

Geriatric Depression ◽

Depression Prevalence

Background: Depression is the commonest psychiatric disorder in the elderly. We attempted to analyze the prevalence and correlates of depression in the north Indian elderly. Methods: An observational study was carried out taking cases from patients attending the geriatric clinic for the first time. Depression was diagnosed by the Geriatric Depression Score short form (≥5). Various epidemiological parameters were assessed in 504 subjects (M = 304, F = 200; mean age = 66.47±13.71 years). Results: Depression prevalence was 45%. A significant correlation was found between depression prevalence and gender (F>M, p=0.011), level of education (p=0.002), marital status (p<0.001) and insomnia (p<0.001) on univariate analysis. On binomial logistic regression analysis, marital status (widowed > married, p=0.008) and insomnia (present > absent, p<0.001) showed significant correlation with depression prevalence. Conclusion: Our study highlights certain epidemiological aspects of depression in the aged Indian population presenting to the tertiary hospital. Spousal loss and insomnia are documented as possible depression risks but longitudinal studies are needed to confirm the same. Keywords: Geriatrics, Depression, Epidemiology, Geriatric Depression Score, Prevalence, Logistic Regression

Download Full-text

PROBABILITAS ANGKATAN KERJA TERDIDIK YANG TIDAK TERSERAP PADA PASAR KERJA

Jurnal REP (Riset Ekonomi Pembangunan) ◽

10.31002/rep.v5i2.3431 ◽

2020 ◽

Vol 5 (2) ◽

pp. 193-203

Author(s):

Muhit Hidayah ◽

◽

Joko Triyanto ◽

Keyword(s):

Human Capital ◽

Logistic Regression ◽

Regression Analysis ◽

Population Growth ◽

Marital Status ◽

Demographic Transition ◽

Economic Factors ◽

Demand Side ◽

Long Run ◽

Household Members

The existence of a demographic transition that in the long run has an impact on the population explosion in the productive age and even the population trend shows a growing pattern of population growth in the productive age. It is feared that the number of people of productive age who are not absorbed in employment will eventually become unemployed. Unemployment of productive age will have an impact on the amount of educated unemployment. This study will analyze the demographic, human capital and economic factors behind educated unemployment in Sragen Regency in 2019, from the supply dan demand side. The data used is the raw data of the results of the National Labor Force Survey (SAKERNAS) in Agustus 2019 from the Statistics of Sragen Regency (BPS) with a sample of 602 respondents. The method used is logistic regression analysis. The results showed that the variables age, number of household members, gender, relationship with the head of the household, marital status, Diploma I / II, Diploma III, Diploma IV / S1 and S2 affect the probability of the educated workforce to be unemployed. Meanwhile, the domicile variable does not significantly affect the probability of the educated workforce being unemployed.

Download Full-text

Ensembling Coalesce of Logistic Regression Classifier for Heart Disease Prediction using Machine Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3473.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 127-133

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Disease ◽

Heart Diseases ◽

Experimental Results ◽

Disease Prediction ◽

Feature Importance ◽

The World ◽

Feature Scaling ◽

Logistic Regression Classifier

In today’s modern world, the world population is affected with some kind of heart diseases. With the vast knowledge and advancement in applications, the analysis and the identification of the heart disease still remain as a challenging issue. Due to the lack of awareness in the availability of patient symptoms, the prediction of heart disease is a questionable task. The World Health Organization has released that 33% of population were died due to the attack of heart diseases. With this background, we have used Heart Disease Prediction dataset extracted from UCI Machine Learning Repository for analyzing and the prediction of heart disease by integrating the ensembling methods. The prediction of heart disease classes are achieved in four ways. Firstly, The important features are extracted for the various ensembling methods like Extra Trees Regressor, Ada boost regressor, Gradient booster regress, Random forest regressor and Ada boost classifier. Secondly, the highly importance features of each of the ensembling methods is filtered from the dataset and it is fitted to logistic regression classifier to analyze the performance. Thirdly, the same extracted important features of each of the ensembling methods are subjected to feature scaling and then fitted with logistic regression to analyze the performance. Fourth, the Performance analysis is done with the performance metric such as Mean Squared error (MSE), Mean Absolute error (MAE), R2 Score, Explained Variance Score (EVS) and Mean Squared Log Error (MSLE). The implementation is done using python language under Spyder platform with Anaconda Navigator. Experimental results shows that before applying feature scaling, the feature importance extracted from the Ada boost classifier is found to be effective with the MSE of 0.04, MAE of 0.07, R2 Score of 92%, EVS of 0.86 and MSLE of 0.16 as compared to other ensembling methods. Experimental results shows that after applying feature scaling, the feature importance extracted from the Ada boost classifier is found to be effective with the MSE of 0.09, MAE of 0.13, R2 Score of 91%, EVS of 0.93 and MSLE of 0.18 as compared to other ensembling methods.

Download Full-text