Improvement in Hadoop performance using integrated feature extraction and machine learning algorithms

2019 ◽  
Vol 24 (1) ◽  
pp. 627-636 ◽  
Author(s):  
C. K. Sarumathiy ◽  
K. Geetha ◽  
C. Rajan
Biology ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 365
Author(s):  
Taha ValizadehAslani ◽  
Zhengqiao Zhao ◽  
Bahrad A. Sokhansanj ◽  
Gail L. Rosen

Machine learning algorithms can learn mechanisms of antimicrobial resistance from the data of DNA sequence without any a priori information. Interpreting a trained machine learning algorithm can be exploited for validating the model and obtaining new information about resistance mechanisms. Different feature extraction methods, such as SNP calling and counting nucleotide k-mers have been proposed for presenting DNA sequences to the model. However, there are trade-offs between interpretability, computational complexity and accuracy for different feature extraction methods. In this study, we have proposed a new feature extraction method, counting amino acid k-mers or oligopeptides, which provides easier model interpretation compared to counting nucleotide k-mers and reaches the same or even better accuracy in comparison with different methods. Additionally, we have trained machine learning algorithms using different feature extraction methods and compared the results in terms of accuracy, model interpretability and computational complexity. We have built a new feature selection pipeline for extraction of important features so that new AMR determinants can be discovered by analyzing these features. This pipeline allows the construction of models that only use a small number of features and can predict resistance accurately.


2021 ◽  
pp. 1-15
Author(s):  
Mohammed Ayub ◽  
El-Sayed M. El-Alfy

Web technology has become an indispensable part in human’s life for almost all activities. On the other hand, the trend of cyberattacks is on the rise in today’s modern Web-driven world. Therefore, effective countermeasures for the analysis and detection of malicious websites is crucial to combat the rising threats to the cyber world security. In this paper, we systematically reviewed the state-of-the-art techniques and identified a total of about 230 features of malicious websites, which are classified as internal and external features. Moreover, we developed a toolkit for the analysis and modeling of malicious websites. The toolkit has implemented several types of feature extraction methods and machine learning algorithms, which can be used to analyze and compare different approaches to detect malicious URLs. Moreover, the toolkit incorporates several other options such as feature selection and imbalanced learning with flexibility to be extended to include more functionality and generalization capabilities. Moreover, some use cases are demonstrated for different datasets.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Muhammad Fayaz ◽  
Muhammad Shuaib Qureshi ◽  
Karlygash Kussainova ◽  
Bermet Burkanova ◽  
Ayman Aljarbouh ◽  
...  

In this paper, we have proposed a novel methodology based on statistical features and different machine learning algorithms. The proposed model can be divided into three main stages, namely, preprocessing, feature extraction, and classification. In the preprocessing stage, the median filter has been used in order to remove salt-and-pepper noise because MRI images are normally affected by this type of noise, the grayscale images are also converted to RGB images in this stage. In the preprocessing stage, the histogram equalization has also been used to enhance the quality of each RGB channel. In the feature extraction stage, the three channels, namely, red, green, and blue, are extracted from the RGB images and statistical measures, namely, mean, variance, skewness, kurtosis, entropy, energy, contrast, homogeneity, and correlation, are calculated for each channel; hence, a total of 27 features, 9 for each channel, are extracted from an RGB image. After the feature extraction stage, different machine learning algorithms, such as artificial neural network, k -nearest neighbors’ algorithm, decision tree, and Naïve Bayes classifiers, have been applied in the classification stage on the features extracted in the feature extraction stage. We recorded the results with all these algorithms and found that the decision tree results are better as compared to the other classification algorithms which are applied on these features. Hence, we have considered decision tree for further processing. We have also compared the results of the proposed method with some well-known algorithms in terms of simplicity and accuracy; it was noted that the proposed method outshines the existing methods.


Sign in / Sign up

Export Citation Format

Share Document