Improvement in Hadoop performance using integrated feature extraction and machine learning algorithms

Machine learning algorithms can learn mechanisms of antimicrobial resistance from the data of DNA sequence without any a priori information. Interpreting a trained machine learning algorithm can be exploited for validating the model and obtaining new information about resistance mechanisms. Different feature extraction methods, such as SNP calling and counting nucleotide k-mers have been proposed for presenting DNA sequences to the model. However, there are trade-offs between interpretability, computational complexity and accuracy for different feature extraction methods. In this study, we have proposed a new feature extraction method, counting amino acid k-mers or oligopeptides, which provides easier model interpretation compared to counting nucleotide k-mers and reaches the same or even better accuracy in comparison with different methods. Additionally, we have trained machine learning algorithms using different feature extraction methods and compared the results in terms of accuracy, model interpretability and computational complexity. We have built a new feature selection pipeline for extraction of important features so that new AMR determinants can be discovered by analyzing these features. This pipeline allows the construction of models that only use a small number of features and can predict resistance accurately.

Download Full-text

Research on machine learning algorithms and feature extraction for time series

2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC) ◽

10.1109/pimrc.2017.8292668 ◽

2017 ◽

Cited By ~ 5

Author(s):

Lei Li ◽

Yabin Wu ◽

Yihang Ou ◽

Qi Li ◽

Yanquan Zhou ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Feature Extraction ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Speaker Accent Recognition Using MFCC Feature Extraction and Machine Learning Algorithms

International Journal of Advances in Engineering and Pure Sciences ◽

10.7240/jeps.896427 ◽

2021 ◽

Author(s):

Ahmet Aytuğ AYRANCI ◽

Sergen ATAY ◽

Tülay YILDIRIM

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Accent Recognition

Download Full-text

Classification of Common and Uncommon Tones by P300 Feature Extraction and Identification of Accurate P300 Wave by Machine Learning Algorithms

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2020.0111080 ◽

2020 ◽

Vol 11 (10) ◽

Author(s):

Rafia Akhter ◽

Kehinde Lawal ◽

Md. Tanvir ◽

Shamim Ahmed

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

P300 Wave

Download Full-text

Inertial Sensor Based Modelling of Human Activity Classes: Feature Extraction and Multi-sensor Data Fusion Using Machine Learning Algorithms

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - eHealth 360° ◽

10.1007/978-3-319-49655-9_38 ◽

2016 ◽

pp. 306-314 ◽

Cited By ~ 6

Author(s):

Tahmina Zebin ◽

Patricia J. Scully ◽

Krikor B. Ozanyan

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Data Fusion ◽

Human Activity ◽

Inertial Sensor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Sensor Data ◽

Sensor Data Fusion ◽

Multi Sensor Data Fusion

Download Full-text

URLCam: Toolkit for malicious URL analysis and modeling

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189874 ◽

2021 ◽

pp. 1-15

Author(s):

Mohammed Ayub ◽

El-Sayed M. El-Alfy

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Feature Selection ◽

State Of The Art ◽

Learning Algorithms ◽

Extraction Methods ◽

Machine Learning Algorithms ◽

The Other ◽

Imbalanced Learning ◽

Almost All

Web technology has become an indispensable part in human’s life for almost all activities. On the other hand, the trend of cyberattacks is on the rise in today’s modern Web-driven world. Therefore, effective countermeasures for the analysis and detection of malicious websites is crucial to combat the rising threats to the cyber world security. In this paper, we systematically reviewed the state-of-the-art techniques and identified a total of about 230 features of malicious websites, which are classified as internal and external features. Moreover, we developed a toolkit for the analysis and modeling of malicious websites. The toolkit has implemented several types of feature extraction methods and machine learning algorithms, which can be used to analyze and compare different approaches to detect malicious URLs. Moreover, the toolkit incorporates several other options such as feature selection and imbalanced learning with flexibility to be extended to include more functionality and generalization capabilities. Moreover, some use cases are demonstrated for different datasets.

Download Full-text

An Improved Brain MRI Classification Methodology Based on Statistical Features and Machine Learning Algorithms

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/8608305 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Muhammad Fayaz ◽

Muhammad Shuaib Qureshi ◽

Karlygash Kussainova ◽

Bermet Burkanova ◽

Ayman Aljarbouh ◽

...

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Decision Tree ◽

Median Filter ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Statistical Features ◽

Statistical Measures ◽

Rgb Images ◽

Extraction Stage

In this paper, we have proposed a novel methodology based on statistical features and different machine learning algorithms. The proposed model can be divided into three main stages, namely, preprocessing, feature extraction, and classification. In the preprocessing stage, the median filter has been used in order to remove salt-and-pepper noise because MRI images are normally affected by this type of noise, the grayscale images are also converted to RGB images in this stage. In the preprocessing stage, the histogram equalization has also been used to enhance the quality of each RGB channel. In the feature extraction stage, the three channels, namely, red, green, and blue, are extracted from the RGB images and statistical measures, namely, mean, variance, skewness, kurtosis, entropy, energy, contrast, homogeneity, and correlation, are calculated for each channel; hence, a total of 27 features, 9 for each channel, are extracted from an RGB image. After the feature extraction stage, different machine learning algorithms, such as artificial neural network, k -nearest neighbors’ algorithm, decision tree, and Naïve Bayes classifiers, have been applied in the classification stage on the features extracted in the feature extraction stage. We recorded the results with all these algorithms and found that the decision tree results are better as compared to the other classification algorithms which are applied on these features. Hence, we have considered decision tree for further processing. We have also compared the results of the proposed method with some well-known algorithms in terms of simplicity and accuracy; it was noted that the proposed method outshines the existing methods.

Download Full-text