Efficient Pre-Processing Techniques for Improving Classifiers Performance

Author(s):  
S. Nickolas ◽  
K. Shobha

Data pre-processing plays a vital role in the life cycle of data mining for accomplishing quality outcomes. In this paper, it is experimentally shown the importance of data pre-processing to achieve highly accurate classifier outcomes by imputing missing values using a novel imputation method, CLUSTPRO, by selecting highly correlated features using Correlation-based Variable Selection (CVS) and by handling imbalanced data using Synthetic Minority Over-sampling Technique (SMOTE). The proposed CLUSTPRO method makes use of Random Forest (RF) and Expectation Maximization (EM) algorithms to impute missing. The imputed results are evaluated using standard evaluation metrics. The CLUSTPRO imputation method outperforms existing, state-of-the-art imputation methods. The combined approach of imputation, feature selection, and imbalanced data handling techniques has significantly contributed to attaining an improved classification accuracy (AUC curve) of 40%–50% in comparison with results obtained without any pre-processing.

Author(s):  
Neeti Kasliwal ◽  
Jagriti Singh

Banking sector is growing rapidly and playing a vital role in the economic development of the nation. Both private and public sector banks are giving more priority to service quality to satisfy their customers. For this, banks are now emphasizing on E-CRM practices to carry out transactions and communicate with their customers. The purpose of this research is to assess the service quality among private and public banks in Rajasthan. Purposive sampling technique has been employed to collect the data from three private banks and three banks from public. To analyze the data, descriptive statistics, Mean score method and t test have been used. Results indicates that there is a significant difference in consumer’s perception of service quality dimensions related to E-CRM practices provided by selected private and public sector banks of Rajasthan..The findings of this research will help policy makers of banking sector to set customer oriented policies.


2021 ◽  
Vol 12 (8) ◽  
Author(s):  
Dawei Chen ◽  
Zhenguo Zhao ◽  
Lu Chen ◽  
Qinghua Li ◽  
Jixue Zou ◽  
...  

AbstractEmerging evidence has demonstrated that alternative splicing has a vital role in regulating protein function, but how alternative splicing factors can be regulated remains unclear. We showed that the PPM1G, a protein phosphatase, regulated the phosphorylation of SRSF3 in hepatocellular carcinoma (HCC) and contributed to the proliferation, invasion, and metastasis of HCC. PPM1G was highly expressed in HCC tissues compared to adjacent normal tissues, and higher levels of PPM1G were observed in adverse staged HCCs. The higher levels of PPM1G were highly correlated with poor prognosis, which was further validated in the TCGA cohort. The knockdown of PPM1G inhibited the cell growth and invasion of HCC cell lines. Further studies showed that the knockdown of PPM1G inhibited tumor growth in vivo. The mechanistic analysis showed that the PPM1G interacted with proteins related to alternative splicing, including SRSF3. Overexpression of PPM1G promoted the dephosphorylation of SRSF3 and changed the alternative splicing patterns of genes related to the cell cycle, the transcriptional regulation in HCC cells. In addition, we also demonstrated that the promoter of PPM1G was activated by multiple transcription factors and co-activators, including MYC/MAX and EP300, MED1, and ELF1. Our study highlighted the essential role of PPM1G in HCC and shed new light on unveiling the regulation of alternative splicing in malignant transformation.


Author(s):  
Ahmad R. Alsaber ◽  
Jiazhu Pan ◽  
Adeeba Al-Hurban 

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.


Author(s):  
Caio Ribeiro ◽  
Alex A. Freitas

AbstractLongitudinal datasets of human ageing studies usually have a high volume of missing data, and one way to handle missing values in a dataset is to replace them with estimations. However, there are many methods to estimate missing values, and no single method is the best for all datasets. In this article, we propose a data-driven missing value imputation approach that performs a feature-wise selection of the best imputation method, using known information in the dataset to rank the five methods we selected, based on their estimation error rates. We evaluated the proposed approach in two sets of experiments: a classifier-independent scenario, where we compared the applicabilities and error rates of each imputation method; and a classifier-dependent scenario, where we compared the predictive accuracy of Random Forest classifiers generated with datasets prepared using each imputation method and a baseline approach of doing no imputation (letting the classification algorithm handle the missing values internally). Based on our results from both sets of experiments, we concluded that the proposed data-driven missing value imputation approach generally resulted in models with more accurate estimations for missing data and better performing classifiers, in longitudinal datasets of human ageing. We also observed that imputation methods devised specifically for longitudinal data had very accurate estimations. This reinforces the idea that using the temporal information intrinsic to longitudinal data is a worthwhile endeavour for machine learning applications, and that can be achieved through the proposed data-driven approach.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Liang He ◽  
Haiyan Xu ◽  
Ginger Y. Ke

PurposeDespite better accessibility and flexibility, peer-to-peer (P2P) lending has suffered from excessive credit risks, which may cause significant losses to the lenders and even lead to the collapse of P2P platforms. The purpose of this research is to construct a hybrid predictive framework that integrates classification, feature selection, and data balance algorithms to cope with the high-dimensional and imbalanced nature of P2P credit data.Design/methodology/approachAn improved synthetic minority over-sampling technique (IMSMOTE) is developed to incorporate the randomness and probability into the traditional synthetic minority over-sampling technique (SMOTE) to enhance the quality of synthetic samples and the controllability of synthetic processes. IMSMOTE is then implemented along with the grey relational clustering (GRC) and the support vector machine (SVM) to facilitate a comprehensive assessment of the P2P credit risks. To enhance the associativity and functionality of the algorithm, a dynamic selection approach is integrated with GRC and then fed in the SVM's process of parameter adaptive adjustment to select the optimal critical value. A quantitative model is constructed to recognize key criteria via multidimensional representativeness.FindingsA series of experiments based on real-world P2P data from Prosper Funding LLC demonstrates that our proposed model outperforms other existing approaches. It is also confirmed that the grey-based GRC approach with dynamic selection succeeds in reducing data dimensions, selecting a critical value, identifying key criteria, and IMSMOTE can efficiently handle the imbalanced data.Originality/valueThe grey-based machine-learning framework proposed in this work can be practically implemented by P2P platforms in predicting the borrowers' credit risks. The dynamic selection approach makes the first attempt in the literature to select a critical value and indicate key criteria in a dynamic, visual and quantitative manner.


2015 ◽  
Vol 21 (1) ◽  
pp. 23
Author(s):  
Zamzam Muhammad Fuad

This research was going to described the role of Banyumas Democracy Volunteer ( Relawan Demokrasi Banyumas) in increasing political public partitipation in Banyumas’s legislative election 2014 and its implication to Banyumas’s political resilience. This research used qualitative research design as a research method. Data were collected by in depth review, observation and documentation. This research used purpossive sampling technique with stakeholder sampling variant to pick informants. The research showed that Banyumas Democracy Volunteer had a positive role in developing political resilience in Banyumas. Their role was gave political education and election education to voters in Banyumas. In the other words, Banyumas Democracy Volunteer had a vital role in developing ideal political resilience in Banyumas.Keywords: Banyumas Democracy Volunteer, Democracy, Election, Political Resilience of Region.


2021 ◽  
Vol 4 (2) ◽  
pp. 531-540
Author(s):  
Nusrat Nawaz Abbasi ◽  
Masood Ahmad ◽  
Muhammad Javed ◽  
Sabiha Iqbal

The study was designed to analyze the teachers’ strategies for motivating students in classroom. The objectives of the study were; to find out the techniques of motivation for students learning achievement; to explore the students’ views about motivation; to evaluate the students’ views regarding teachers’ teaching style; to find out gender wise significance difference. The study was design for Bahawal Nagar district, so Bahawal Nagar were the population of the study. Stratified sampling technique was used to select sample. One hundred and thirty two (132) students were selected from selected schools in which 66 schools were male and 66 female. Self-constructed instrument on 4 point Likert scale was used to collect data. The major findings of the study were teachers motivate the students at primary level by adopting different techniques and strategies. The teachers’ behaviour, personality, teaching methodology and school environment are also factors affecting the students’ learning process. Immediate appreciation, rewards, punishment, reinforcement and encouragement play a vital role for motivating the students. It was also found that female teachers were used more motivational strategies to motivate the students in classroom as compared to male teachers.


2020 ◽  
Vol 70 (1) ◽  
pp. 66-71 ◽  
Author(s):  
Manvendra Singh ◽  
Sudhir Khare ◽  
Brajesh Kumar Kaushik

Surveillance of maritime domain is absolutely vital to ensure an appropriate response against any adverse situation relating to maritime safety or security. Electro-optic search and track (EOST) system plays a vital role by providing independent search and track of potential targets in marine environment. EOST provides real-time images of objects with details, required to neutralise threats. At long range, detection and tracking capability of EOST degrades due to uncertainty in target signatures under cluttered scenario. Image quality can be improved by using suitable sensors and enhancement using the target/background signature knowledge. Robust tracking of object can be achieved by optimising the performance parameters of tracker. In the present work, improvement in the performance of EOST subsystems such as sensor, video processor and video tracker are discussed. To improve EOST performance in terms of detection and tracking, sensor selection criterion and various real time image processing techniques and their selection criteria for maritime applications have been also discussed. Resultant improvement in the quality of image recorded under marine environment has been presented.


Author(s):  
Victor Adoma ◽  
Maxwell Adom Darko

The marketing and sale of alcoholic drinks have of late, witnessed an irresistible boom and alcoholic producing firms are enjoying field days. Drinking of alcohol has become a significant part in the social lives of most young people even though the abusive use of alcohol has been known as a key problem of young people in many societies. A case study design was employed in the research. This research investigates the impacts of alcohol beverage advertisement on the purchasing behaviour of students at Sunyani Technical University. A probability sampling technique was used to select the 300 respondents to participate in the research. Microsoft Excel was used to import data from the Statistical Package for Social Sciences (SPSS). The analysis from the survey data indicates that most of the male students contributed and 18-35 years age category dominated in the study. The survey data designates that students do not take in alcohol and most drink alcohol during special occasions and few take it heavily. The present study explored the impact of alcohol beverage advertisement as a predictor variable on the purchasing behaviour of students at Sunyani Technical University. This means that alcohol beverage advertisement plays a vital role in students' alcoholic purchasing behaviour. Therefore, alcohol producers and marketers should incorporate these elements in adverts intended to attract their targets. The study, therefore, recommended, Alcohol manufacturers and dealers should integrate these elements in adverts intended to attract their targets, most advertisements must be run on televisions, radio, music video, billboard and movies as it is most effective introducing products to consumers, policymakers and all stakeholders in education and health should also take into consideration when planning to introduce policies to control alcohol consumption.


Sign in / Sign up

Export Citation Format

Share Document