scholarly journals Minimizing False Negatives of Measles Prediction Model: An Experimentation of Feature Selection Based On Domain Knowledge and Random Forest Classifier

In the context of disease prediction model, false negative error occurs when the patient is wrongly predicted as free from the disease.A prediction model development involves the process of data collection and feature selection which extracts relevant features from the dataset. Two commonly employed feature selection approaches are domain knowledge and datadriven, that suffer from bias towards past or current knowledge when applied alone.In this research, we have studied the developmentof measles prediction model by incorporating both the domain knowledge and the data-driven approaches, in particular, the Random Forest classifier.The domain expert has earlier on set the important features based uponhisprior knowledgeon measles for the purpose of minimizing the size of features. Afterward, the attributes became the input in Random Forest classifier and the least important attributes are excluded using the Mean Decrease Gini, in order to experiment its effect on the result. It is found that the removal ofseveral attributes after domain knowledge consultation can provide a good model with less false negative errors.

2020 ◽  
Vol 184 ◽  
pp. 01011
Author(s):  
Sreethi Musunuru ◽  
Mahaalakshmi Mukkamala ◽  
Latha Kunaparaju ◽  
N V Ganapathi Raju

Though banks hold an abundance of data on their customers in general, it is not unusual for them to track the actions of the creditors regularly to improve the services they offer to them and understand why a lot of them choose to exit and shift to other banks. Analyzing customer behavior can be highly beneficial to the banks as they can reach out to their customers on a personal level and develop a business model that will improve the pricing structure, communication, advertising, and benefits for their customers and themselves. Features like the amount a customer credits every month, his salary per annum, the gender of the customer, etc. are used to classify them using machine learning algorithms like K Neighbors Classifier and Random Forest Classifier. On classifying the customers, banks can get an idea of who will be continuing with them and who will be leaving them in the near future. Our study determines to remove the features that are independent but are not influential to determine the status of the customers in the future without the loss of accuracy and to improve the model to see if this will also increase the accuracy of the results.


Author(s):  
Mia Huljanah ◽  
Zuherman Rustam ◽  
Suarsih Utama ◽  
Titin Siswantining

2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Zhen-Hao Guo ◽  
Zhu-Hong You ◽  
De-Shuang Huang ◽  
Hai-Cheng Yi ◽  
Zhan-Heng Chen ◽  
...  

AbstractAbundant life activities are maintained by various biomolecule relationships in human cells. However, many previous computational models only focus on isolated objects, without considering that cell is a complete entity with ample functions. Inspired by holism, we constructed a Molecular Associations Network (MAN) including 9 kinds of relationships among 5 types of biomolecules, and a prediction model called MAN-GF. More specifically, biomolecules can be represented as vectors by the algorithm called biomarker2vec which combines 2 kinds of information involved the attribute learned by k-mer, etc and the behavior learned by Graph Factorization (GF). Then, Random Forest classifier is applied for training, validation and test. MAN-GF obtained a substantial performance with AUC of 0.9647 and AUPR of 0.9521 under 5-fold Cross-validation. The results imply that MAN-GF with an overall perspective can act as ancillary for practice. Besides, it holds great hope to provide a new insight to elucidate the regulatory mechanisms.


Sign in / Sign up

Export Citation Format

Share Document