scholarly journals Machine Learning Based Clinical Diagnosis of Liver Patients with Instance Replacement

Author(s):  
J. V. D. Prasad ◽  
A. Raghuvira Pratap ◽  
Babu Sallagundla

With the rapid increase in number of clinical data and hence the prediction and analysing data becomes very difficult. With the help of various machine learning models, it becomes easy to work on these huge data. A machine learning model faces lots of challenges; one among the challenge is feature selection. In this research work, we propose a novel feature selection method based on statistical procedures to increase the performance of the machine learning model. Furthermore, we have tested the feature selection algorithm in liver disease classification dataset and the results obtained shows the efficiency of the proposed method.

2021 ◽  
Author(s):  
K venkatachalam ◽  
P Prabhu ◽  
B saravana Balaji ◽  
Mohamed Abouhawwash ◽  
R Rajadevi

Abstract In day today life, diabetes illness is increasing in count due to the body not able to metabolize the glucose level. The prediction of the right diabetes patients is an important research area that many researchers are proposing the techniques to predict this disease through data mining and machine learning methods. In prediction, feature selection is one of the key concept in preprocessing so that the features that are relevant to the disease will be used for prediction. This will improve the prediction accuracy. Selecting right features among the whole feature set is a complicated process and many researchers are concentrating on it to produce the predictive model with high accuracy. In this proposed work, the wrapper based feature selection method called Recursive Feature Elimination (RFE) is combined with Ridge regression (L2) to form a hybrid L2 regulated feature selection algorithm to overcome the overfilling problem of the data set. Over fitting is the major problem in feature selection which means that the new data are not fit to the model since the training data is small. Ridge regression is mainly used to overcome the overfitting problem. Once the features are selected using the proposed feature selection method, random forest classifier is used to classify the data based on the selected features. The proposed work is experimented in PIDD data set and the evaluated results are compared with the existing algorithms to prove the accuracy effect of the proposed algorithm. From the results obtained by proposed algorithm, the accuracy of predicting the diabetes disease is high compared to other existing algorithms.


Author(s):  
Wenjie Liu ◽  
Shanshan Wang ◽  
Xin Chen ◽  
He Jiang

In software maintenance process, it is a fairly important activity to predict the severity of bug reports. However, manually identifying the severity of bug reports is a tedious and time-consuming task. So developing automatic judgment methods for predicting the severity of bug reports has become an urgent demand. In general, a bug report contains a lot of descriptive natural language texts, thus resulting in a high-dimensional feature set which poses serious challenges to traditionally automatic methods. Therefore, we attempt to use automatic feature selection methods to improve the performance of the severity prediction of bug reports. In this paper, we introduce a ranking-based strategy to improve existing feature selection algorithms and propose an ensemble feature selection algorithm by combining existing ones. In order to verify the performance of our method, we run experiments over the bug reports of Eclipse and Mozilla and conduct comparisons with eight commonly used feature selection methods. The experiment results show that the ranking-based strategy can effectively improve the performance of the severity prediction of bug reports by up to 54.76% on average in terms of [Formula: see text]-measure, and it also can significantly reduce the dimension of the feature set. Meanwhile, the ensemble feature selection method can get better results than a single feature selection algorithm.


Author(s):  
Jia Luo ◽  
Dongwen Yu ◽  
Zong Dai

It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators.


Electronics ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 144 ◽  
Author(s):  
Yan Naung Soe ◽  
Yaokai Feng ◽  
Paulus Insap Santosa ◽  
Rudy Hartanto ◽  
Kouichi Sakurai

The application of a large number of Internet of Things (IoT) devices makes our life more convenient and industries more efficient. However, it also makes cyber-attacks much easier to occur because so many IoT devices are deployed and most of them do not have enough resources (i.e., computation and storage capacity) to carry out ordinary intrusion detection systems (IDSs). In this study, a lightweight machine learning-based IDS using a new feature selection algorithm is designed and implemented on Raspberry Pi, and its performance is verified using a public dataset collected from an IoT environment. To make the system lightweight, we propose a new algorithm for feature selection, called the correlated-set thresholding on gain-ratio (CST-GR) algorithm, to select really necessary features. Because the feature selection is conducted on three specific kinds of cyber-attacks, the number of selected features can be significantly reduced, which makes the classifiers very small and fast. Thus, our detection system is lightweight enough to be implemented and carried out in a Raspberry Pi system. More importantly, as the really necessary features corresponding to each kind of attack are exploited, good detection performance can be expected. The performance of our proposal is examined in detail with different machine learning algorithms, in order to learn which of them is the best option for our system. The experiment results indicate that the new feature selection algorithm can select only very few features for each kind of attack. Thus, the detection system is lightweight enough to be implemented in the Raspberry Pi environment with almost no sacrifice on detection performance.


2013 ◽  
Vol 22 (04) ◽  
pp. 1350027
Author(s):  
JAGANATHAN PALANICHAMY ◽  
KUPPUCHAMY RAMASAMY

Feature selection is essential in data mining and pattern recognition, especially for database classification. During past years, several feature selection algorithms have been proposed to measure the relevance of various features to each class. A suitable feature selection algorithm normally maximizes the relevancy and minimizes the redundancy of the selected features. The mutual information measure can successfully estimate the dependency of features on the entire sampling space, but it cannot exactly represent the redundancies among features. In this paper, a novel feature selection algorithm is proposed based on maximum relevance and minimum redundancy criterion. The mutual information is used to measure the relevancy of each feature with class variable and calculate the redundancy by utilizing the relationship between candidate features, selected features and class variables. The effectiveness is tested with ten benchmarked datasets available in UCI Machine Learning Repository. The experimental results show better performance when compared with some existing algorithms.


Author(s):  
Chunyong Yin ◽  
Luyu Ma ◽  
Lu Feng

Intrusion detection is a kind of security mechanism which is used to detect attacks and intrusion behaviors. Due to the low accuracy and the high false positive rate of the existing clonal selection algorithms applied to intrusion detection, in this paper, we proposed a feature selection method for improved clonal algorithm. The improved method detects the intrusion behavior by selecting the best individual overall and clones them. Experimental results show that the feature selection algorithm is better than the traditional feature selection algorithm on the different classifiers, and it is shown that the final detection results are better than traditional clonal algorithm with 99.6% accuracy and 0.1% false positive rate.


Author(s):  
Hwayoung Park ◽  
Sungtae Shin ◽  
Changhong Youm ◽  
Sang-Myung Cheon ◽  
Myeounggon Lee ◽  
...  

Abstract Background Freezing of gait (FOG) is a sensitive problem, which is caused by motor control deficits and requires greater attention during postural transitions such as turning in people with Parkinson’s disease (PD). However, the turning characteristics have not yet been extensively investigated to distinguish between people with PD with and without FOG (freezers and non-freezers) based on full-body kinematic analysis during the turning task. The objectives of this study were to identify the machine learning model that best classifies people with PD and freezers and reveal the associations between clinical characteristics and turning features based on feature selection through stepwise regression. Methods The study recruited 77 people with PD (31 freezers and 46 non-freezers) and 34 age-matched older adults. The 360° turning task was performed at the preferred speed for the inner step of the more affected limb. All experiments on the people with PD were performed in the “Off” state of medication. The full-body kinematic features during the turning task were extracted using the three-dimensional motion capture system. These features were selected via stepwise regression. Results In feature selection through stepwise regression, five and six features were identified to distinguish between people with PD and controls and between freezers and non-freezers (PD and FOG classification problem), respectively. The machine learning model accuracies revealed that the random forest (RF) model had 98.1% accuracy when using all turning features and 98.0% accuracy when using the five features selected for PD classification. In addition, RF and logistic regression showed accuracies of 79.4% when using all turning features and 72.9% when using the six selected features for FOG classification. Conclusion We suggest that our study leads to understanding of the turning characteristics of people with PD and freezers during the 360° turning task for the inner step of the more affected limb and may help improve the objective classification and clinical assessment by disease progression using turning features.


Diabetes has become a serious problem now a day. So there is a need to take serious precautions to eradicate this. To eradicate, we should know the level of occurrence. In this project we predict the level of occurrence of diabetes. We predict the level of occurrence of diabetes using Random Forest, a Machine Learning Algorithm. Using the patient’s Electronic Health Records (EHR) we can build accurate models that predict the presence of diabetes.


Sign in / Sign up

Export Citation Format

Share Document