Hybrid System based on Rough Sets and Genetic Algorithms for Medical Data Classifications

2013 ◽  
Vol 3 (4) ◽  
pp. 31-46 ◽  
Author(s):  
Hanaa Ismail Elshazly ◽  
Ahmad Taher Azar ◽  
Aboul Ella Hassanien ◽  
Abeer Mohamed Elkorany

Computational intelligence provides the biomedical domain by a significant support. The application of machine learning techniques in medical applications have been evolved from the physician needs. Screening, medical images, pattern classification, prognosis are some examples of health care support systems. Typically medical data has its own characteristics such as huge size and features, continuous and real attributes that refer to patients' investigations. Therefore, discretization and feature selection process are considered a key issue in improving the extracted knowledge from patients' investigations records. In this paper, a hybrid system that integrates Rough Set (RS) and Genetic Algorithm (GA) is presented for the efficient classification of medical data sets of different sizes and dimensionalities. Genetic Algorithm is applied with the aim of reducing the dimension of medical datasets and RS decision rules were used for efficient classification. Furthermore, the proposed system applies the Entropy Gain Information (EI) for discretization process. Four biomedical data sets are tested by the proposed system (EI-GA-RS), and the highest score was obtained through three different datasets. Other different hybrid techniques shared the proposed technique the highest accuracy but the proposed system preserves its place as one of the highest results systems four three different sets. EI as discretization technique also is a common part for the best results in the mentioned datasets while RS as an evaluator realized the best results in three different data sets.

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Hemalatha Gunasekaran ◽  
K. Ramalakshmi ◽  
A. Rex Macedo Arokiaraj ◽  
S. Deepa Kanmani ◽  
Chandran Venkatesan ◽  
...  

In a general computational context for biomedical data analysis, DNA sequence classification is a crucial challenge. Several machine learning techniques have used to complete this task in recent years successfully. Identification and classification of viruses are essential to avoid an outbreak like COVID-19. Regardless, the feature selection process remains the most challenging aspect of the issue. The most commonly used representations worsen the case of high dimensionality, and sequences lack explicit features. It also helps in detecting the effect of viruses and drug design. In recent days, deep learning (DL) models can automatically extract the features from the input. In this work, we employed CNN, CNN-LSTM, and CNN-Bidirectional LSTM architectures using Label and K -mer encoding for DNA sequence classification. The models are evaluated on different classification metrics. From the experimental results, the CNN and CNN-Bidirectional LSTM with K -mer encoding offers high accuracy with 93.16% and 93.13%, respectively, on testing data.


Author(s):  
Gediminas Adomavicius ◽  
Yaqiong Wang

Numerical predictive modeling is widely used in different application domains. Although many modeling techniques have been proposed, and a number of different aggregate accuracy metrics exist for evaluating the overall performance of predictive models, other important aspects, such as the reliability (or confidence and uncertainty) of individual predictions, have been underexplored. We propose to use estimated absolute prediction error as the indicator of individual prediction reliability, which has the benefits of being intuitive and providing highly interpretable information to decision makers, as well as allowing for more precise evaluation of reliability estimation quality. As importantly, the proposed reliability indicator allows the reframing of reliability estimation itself as a canonical numeric prediction problem, which makes the proposed approach general-purpose (i.e., it can work in conjunction with any outcome prediction model), alleviates the need for distributional assumptions, and enables the use of advanced, state-of-the-art machine learning techniques to learn individual prediction reliability patterns directly from data. Extensive experimental results on multiple real-world data sets show that the proposed machine learning-based approach can significantly improve individual prediction reliability estimation as compared with a number of baselines from prior work, especially in more complex predictive scenarios.


The Intrusion is a major threat to unauthorized data or legal network using the legitimate user identity or any of the back doors and vulnerabilities in the network. IDS mechanisms are developed to detect the intrusions at various levels. The objective of the research work is to improve the Intrusion Detection System performance by applying machine learning techniques based on decision trees for detection and classification of attacks. The methodology adapted will process the datasets in three stages. The experimentation is conducted on KDDCUP99 data sets based on number of features. The Bayesian three modes are analyzed for different sized data sets based upon total number of attacks. The time consumed by the classifier to build the model is analyzed and the accuracy is done.


Polymers ◽  
2021 ◽  
Vol 13 (18) ◽  
pp. 3100
Author(s):  
Anusha Mairpady ◽  
Abdel-Hamid I. Mourad ◽  
Mohammad Sayem Mozumder

The selection of nanofillers and compatibilizing agents, and their size and concentration, are always considered to be crucial in the design of durable nanobiocomposites with maximized mechanical properties (i.e., fracture strength (FS), yield strength (YS), Young’s modulus (YM), etc). Therefore, the statistical optimization of the key design factors has become extremely important to minimize the experimental runs and the cost involved. In this study, both statistical (i.e., analysis of variance (ANOVA) and response surface methodology (RSM)) and machine learning techniques (i.e., artificial intelligence-based techniques (i.e., artificial neural network (ANN) and genetic algorithm (GA)) were used to optimize the concentrations of nanofillers and compatibilizing agents of the injection-molded HDPE nanocomposites. Initially, through ANOVA, the concentrations of TiO2 and cellulose nanocrystals (CNCs) and their combinations were found to be the major factors in improving the durability of the HDPE nanocomposites. Further, the data were modeled and predicted using RSM, ANN, and their combination with a genetic algorithm (i.e., RSM-GA and ANN-GA). Later, to minimize the risk of local optimization, an ANN-GA hybrid technique was implemented in this study to optimize multiple responses, to develop the nonlinear relationship between the factors (i.e., the concentration of TiO2 and CNCs) and responses (i.e., FS, YS, and YM), with minimum error and with regression values above 95%.


2021 ◽  
Author(s):  
Hugo Abreu Mendes ◽  
João Fausto Lorenzato Oliveira ◽  
Paulo Salgado Gomes Mattos Neto ◽  
Alex Coutinho Pereira ◽  
Eduardo Boudoux Jatoba ◽  
...  

Within the context of clean energy generation, solar radiation forecast is applied for photovoltaic plants to increase maintainability and reliability. Statistical models of time series like ARIMA and machine learning techniques help to improve the results. Hybrid Statistical + ML are found in all sorts of time series forecasting applications. This work presents a new way to automate the SARIMAX modeling, nesting PSO and ACO optimization algorithms, differently from R's AutoARIMA, its searches optimal seasonality parameter and combination of the exogenous variables available. This work presents 2 distinct hybrid models that have MLPs as their main elements, optimizing the architecture with Genetic Algorithm. A methodology was used to obtain the results, which were compared to LSTM, CLSTM, MMFF and NARNN-ARMAX topologies found in recent works. The obtained results for the presented models is promising for use in automatic radiation forecasting systems since it outperformed the compared models on at least two metrics.


2019 ◽  
Vol 119 (3) ◽  
pp. 676-696 ◽  
Author(s):  
Zhongyi Hu ◽  
Raymond Chiong ◽  
Ilung Pranata ◽  
Yukun Bao ◽  
Yuqing Lin

Purpose Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this paper to investigate the use of machine learning techniques for malicious web domain identification by considering the class imbalance issue (i.e. there are more benign web domains than malicious ones). Design/methodology/approach The authors propose an integrated resampling approach to handle class imbalance by combining the synthetic minority oversampling technique (SMOTE) and particle swarm optimisation (PSO), a population-based meta-heuristic algorithm. The authors use the SMOTE for oversampling and PSO for undersampling. Findings By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain data sets with different imbalance ratios. Compared to five other well-known resampling approaches, experimental results confirm that the proposed approach is highly effective. Practical implications This study not only inspires the practical use of online credibility and performance data for identifying malicious web domains but also provides an effective resampling approach for handling the class imbalance issue in the area of malicious web domain identification. Originality/value Online credibility and performance data are applied to build malicious web domain identification models using machine learning techniques. An integrated resampling approach is proposed to address the class imbalance issue. The performance of the proposed approach is confirmed based on real-world data sets with different imbalance ratios.


2018 ◽  
Vol 7 (4) ◽  
pp. 2738
Author(s):  
P. Srinivas Rao ◽  
Jayadev Gyani ◽  
G. Narsimha

In online social network’s phony account detection is one of the major task among the ability of genuine user from forged user account. The fundamental objective of detection of phony account framework is to detect fake account and removal technique in Social network user sites. This work concentrates on detection of phony account in which it depends on normal basis framework, transformative Algorithms and fuzzy technique. Initially, the most essential attributes including personal attributes, comparability techniques and various real user review, tweets, or comments are extricated. A direct blend of these attributes demonstrates the significance of each reviews tweets comments etc. To compute closeness measure, a consolidated strategy in view of artificial honey bee state Algorithm and fuzzy technique are utilized. Second approach is proposed to alter the best weights of the normal user attributes utilizing the social network activities/transaction and inherited Algorithm. Finally, a normal rank rationale framework is utilized to calculate the final scoring of normal user activities. The decision making of proposed approach to find phony account are variation with existing techniques user behavioral analysis using data sets and machine learning techniques such as crowdflower_sample and genuine_accounts_sample dataset of facebook and Twitter. The outcomes demonstrate that proposed strategy overcomes the previously mentioned strategies. 


2020 ◽  
Author(s):  
Yosoon Choi ◽  
Jieun Baek ◽  
Jangwon Suh ◽  
Sung-Min Kim

<p>In this study, we proposed a method to utilize a multi-sensor Unmanned Aerial System (UAS) for exploration of hydrothermal alteration zones. This study selected an area (10m × 20m) composed mainly of the andesite and located on the coast, with wide outcrops and well-developed structural and mineralization elements. Multi-sensor (visible, multispectral, thermal, magnetic) data were acquired in the study area using UAS, and were studied using machine learning techniques. For utilizing the machine learning techniques, we applied the stratified random method to sample 1000 training data in the hydrothermal zone and 1000 training data in the non-hydrothermal zone identified through the field survey. The 2000 training data sets created for supervised learning were first classified into 1500 for training and 500 for testing. Then, 1500 for training were classified into 1200 for training and 300 for validation. The training and validation data for machine learning were generated in five sets to enable cross-validation. Five types of machine learning techniques were applied to the training data sets: k-Nearest Neighbors (k-NN), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Deep Neural Network (DNN). As a result of integrated analysis of multi-sensor data using five types of machine learning techniques, RF and SVM techniques showed high classification accuracy of about 90%. Moreover, performing integrated analysis using multi-sensor data showed relatively higher classification accuracy in all five machine learning techniques than analyzing magnetic sensing data or single optical sensing data only.</p>


Sign in / Sign up

Export Citation Format

Share Document