scholarly journals Systematic Framework to Predict Early-Stage Liver Carcinoma Using Hybrid of Feature Selection Techniques and Regression Techniques

Complexity ◽  
2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Marium Mehmood ◽  
Nasser Alshammari ◽  
Saad Awadh Alanazi ◽  
Fahad Ahmad

The liver is the human body’s mandatory organ, but detecting liver disease at an early stage is very difficult due to the hiddenness of symptoms. Liver diseases may cause loss of energy or weakness when some irregularities in the working of the liver get visible. Cancer is one of the most common diseases of the liver and also the most fatal of all. Uncontrolled growth of harmful cells is developed inside the liver. If diagnosed late, it may cause death. Treatment of liver diseases at an early stage is, therefore, an important issue as is designing a model to diagnose early disease. Firstly, an appropriate feature should be identified which plays a more significant part in the detection of liver cancer at an early stage. Therefore, it is essential to extract some essential features from thousands of unwanted features. So, these features will be mined using data mining and soft computing techniques. These techniques give optimized results that will be helpful in disease diagnosis at an early stage. In these techniques, we use feature selection methods to reduce the dataset’s feature, which include Filter, Wrapper, and Embedded methods. Different Regression algorithms are then applied to these methods individually to evaluate the result. Regression algorithms include Linear Regression, Ridge Regression, LASSO Regression, Support Vector Regression, Decision Tree Regression, Multilayer Perceptron Regression, and Random Forest Regression. Based on the accuracy and error rates generated by these Regression algorithms, we have evaluated our results. The result shows that Random Forest Regression with the Wrapper Method from all the deployed Regression techniques is the best and gives the highest R2-Score of 0.8923 and lowest MSE of 0.0618.

The advanced computing techniques and its applications on other engineering disciplines accelerated the different aspects and phases in engineering process. Nowadays there are so many computer aided methods widely used in civil engineering domain. The mathematical relationship between ratios of different concrete components and other influencing factors with its compression strength need to be analyzed for different engineering needs. This paper aims to develop a mathematical relationship after analyzing the above factors and to foresee the compressive strength of concrete by applying various regression techniques such as linear regression, support vector regression, decision tree regression and random forest regression on assumeddata set., It was found that the accuracy of the random forest regression was considerable as per the result after applying the various regression techniques.


Author(s):  
Binish Khan ◽  
Piyush Kumar Shukla ◽  
Manish Kumar Ahirwar ◽  
Manish Mishra

Liver diseases avert the normal activity of the liver. Discovering the presence of liver disorder at an early stage is a complex task for the doctors. Predictive analysis of liver disease using classification algorithms is an efficacious task that can help the doctors to diagnose the disease within a short duration of time. The main motive of this study is to analyze the parameters of various classification algorithms and compare their predictive accuracies so as to find the best classifier for determining the liver disease. This chapter focuses on the related works of various authors on liver disease such that algorithms were implemented using Weka tool that is a machine learning software written in Java. Also, orange tool is utilized to compare several classification algorithms in terms of accuracy. In this chapter, random forest, logistic regression, and support vector machine were estimated with an aim to identify the best classifier. Based on this study, random forest with the highest accuracy outperformed the other algorithms.


2017 ◽  
Vol 14 (23) ◽  
pp. 5551-5569 ◽  
Author(s):  
Luke Gregor ◽  
Schalk Kok ◽  
Pedro M. S. Monteiro

Abstract. The Southern Ocean accounts for 40 % of oceanic CO2 uptake, but the estimates are bound by large uncertainties due to a paucity in observations. Gap-filling empirical methods have been used to good effect to approximate pCO2 from satellite observable variables in other parts of the ocean, but many of these methods are not in agreement in the Southern Ocean. In this study we propose two additional methods that perform well in the Southern Ocean: support vector regression (SVR) and random forest regression (RFR). The methods are used to estimate ΔpCO2 in the Southern Ocean based on SOCAT v3, achieving similar trends to the SOM-FFN method by Landschützer et al. (2014). Results show that the SOM-FFN and RFR approaches have RMSEs of similar magnitude (14.84 and 16.45 µatm, where 1 atm  =  101 325 Pa) where the SVR method has a larger RMSE (24.40 µatm). However, the larger errors for SVR and RFR are, in part, due to an increase in coastal observations from SOCAT v2 to v3, where the SOM-FFN method used v2 data. The success of both SOM-FFN and RFR depends on the ability to adapt to different modes of variability. The SOM-FFN achieves this by having independent regression models for each cluster, while this flexibility is intrinsic to the RFR method. Analyses of the estimates shows that the SVR and RFR's respective sensitivity and robustness to outliers define the outcome significantly. Further analyses on the methods were performed by using a synthetic dataset to assess the following: which method (RFR or SVR) has the best performance? What is the effect of using time, latitude and longitude as proxy variables on ΔpCO2? What is the impact of the sampling bias in the SOCAT v3 dataset on the estimates? We find that while RFR is indeed better than SVR, the ensemble of the two methods outperforms either one, due to complementary strengths and weaknesses of the methods. Results also show that for the RFR and SVR implementations, it is better to include coordinates as proxy variables as RMSE scores are lowered and the phasing of the seasonal cycle is more accurate. Lastly, we show that there is only a weak bias due to undersampling. The synthetic data provide a useful framework to test methods in regions of sparse data coverage and show potential as a useful tool to evaluate methods in future studies.


2019 ◽  
Vol 11 (11) ◽  
pp. 3222 ◽  
Author(s):  
Pascal Schirmer ◽  
Iosif Mporas

In this paper we evaluate several well-known and widely used machine learning algorithms for regression in the energy disaggregation task. Specifically, the Non-Intrusive Load Monitoring approach was considered and the K-Nearest-Neighbours, Support Vector Machines, Deep Neural Networks and Random Forest algorithms were evaluated across five datasets using seven different sets of statistical and electrical features. The experimental results demonstrated the importance of selecting both appropriate features and regression algorithms. Analysis on device level showed that linear devices can be disaggregated using statistical features, while for non-linear devices the use of electrical features significantly improves the disaggregation accuracy, as non-linear appliances have non-sinusoidal current draw and thus cannot be well parametrized only by their active power consumption. The best performance in terms of energy disaggregation accuracy was achieved by the Random Forest regression algorithm.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2673 ◽  
Author(s):  
Daniel Kristiyanto ◽  
Kevin E. Anderson ◽  
Ling-Hong Hung ◽  
Ka Yee Yeung

Prostate cancer is the most common cancer among men in developed countries. Androgen deprivation therapy (ADT) is the standard treatment for prostate cancer. However, approximately one third of all patients with metastatic disease treated with ADT develop resistance to ADT. This condition is called metastatic castrate-resistant prostate cancer (mCRPC). Patients who do not respond to hormone therapy are often treated with a chemotherapy drug called docetaxel. Sub-challenge 2 of the Prostate Cancer DREAM Challenge aims to improve the prediction of whether a patient with mCRPC would discontinue docetaxel treatment due to adverse effects. Specifically, a dataset containing three distinct clinical studies of patients with mCRPC treated with docetaxel was provided. We  applied the k-nearest neighbor method for missing data imputation, the hill climbing algorithm and random forest importance for feature selection, and the random forest algorithm for classification. We also empirically studied the performance of many classification algorithms, including support vector machines and neural networks. Additionally, we found using random forest importance for feature selection provided slightly better results than the more computationally expensive method of hill climbing.


Author(s):  
Harsha A K

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.


Author(s):  
Beaulah Jeyavathana Rajendran ◽  
Kanimozhi K. V.

Tuberculosis is one of the hazardous infectious diseases that can be categorized by the evolution of tubercles in the tissues. This disease mainly affects the lungs and also the other parts of the body. The disease can be easily diagnosed by the radiologists. The main objective of this chapter is to get best solution selected by means of modified particle swarm optimization is regarded as optimal feature descriptor. Five stages are being used to detect tuberculosis disease. They are pre-processing an image, segmenting the lungs and extracting the feature, feature selection and classification. These stages that are used in medical image processing to identify the tuberculosis. In the feature extraction, the GLCM approach is used to extract the features and from the extracted feature sets the optimal features are selected by random forest. Finally, support vector machine classifier method is used for image classification. The experimentation is done, and intermediate results are obtained. The proposed system accuracy results are better than the existing method in classification.


2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Elhadi Adam ◽  
Houtao Deng ◽  
John Odindi ◽  
Elfatih M. Abdel-Rahman ◽  
Onisimo Mutanga

Phaeosphaeria leaf spot (PLS) is considered one of the major diseases that threaten the stability of maize production in tropical and subtropical African regions. The objective of the present study was to investigate the use of hyperspectral data in detecting the early stage of PLS in tropical maize. Field data were collected from healthy and the early stage of PLS over two years (2013 and 2014) using a handheld spectroradiometer. An integration of a newly developed guided regularized random forest (GRRF) and a traditional random forest (RF) was used for feature selection and classification, respectively. The 2013 dataset was used to train the model, while the 2014 dataset was used as independent test dataset. Results showed that there were statistically significant differences in biochemical concentration between the healthy leaves and leaves that were at an early stage of PLS infestation. The newly developed GRRF was able to reduce the high dimensionality of hyperspectral data by selecting key wavelengths with less autocorrelation. These wavelengths are located at 420 nm, 795 nm, 779 nm, 1543 nm, 1747 nm, and 1010 nm. Using these variables (n=6), a random forest classifier was able to discriminate between the healthy maize and maize at an early stage of PLS infestation with an overall accuracy of 88% and a kappa value of 0.75. Overall, our study showed potential application of hyperspectral data, GRRF feature selection, and RF classifiers in detecting the early stage of PLS infestation in tropical maize.


Sign in / Sign up

Export Citation Format

Share Document