Efficiency Enhancement of Machine Learning Approaches through the Impact of Preprocessing Techniques

Author(s):  
Vineeta Gulati ◽  
Neeraj Raheja
Cancers ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2764
Author(s):  
Xin Yu Liew ◽  
Nazia Hameed ◽  
Jeremie Clos

A computer-aided diagnosis (CAD) expert system is a powerful tool to efficiently assist a pathologist in achieving an early diagnosis of breast cancer. This process identifies the presence of cancer in breast tissue samples and the distinct type of cancer stages. In a standard CAD system, the main process involves image pre-processing, segmentation, feature extraction, feature selection, classification, and performance evaluation. In this review paper, we reviewed the existing state-of-the-art machine learning approaches applied at each stage involving conventional methods and deep learning methods, the comparisons within methods, and we provide technical details with advantages and disadvantages. The aims are to investigate the impact of CAD systems using histopathology images, investigate deep learning methods that outperform conventional methods, and provide a summary for future researchers to analyse and improve the existing techniques used. Lastly, we will discuss the research gaps of existing machine learning approaches for implementation and propose future direction guidelines for upcoming researchers.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Alireza Davoudi ◽  
Mohsen Ahmadi ◽  
Abbas Sharifi ◽  
Roshina Hassantabar ◽  
Narges Najafi ◽  
...  

Statins can help COVID-19 patients’ treatment because of their involvement in angiotensin-converting enzyme-2. The main objective of this study is to evaluate the impact of statins on COVID-19 severity for people who have been taking statins before COVID-19 infection. The examined research patients include people that had taken three types of statins consisting of Atorvastatin, Simvastatin, and Rosuvastatin. The case study includes 561 patients admitted to the Razi Hospital in Ghaemshahr, Iran, during February and March 2020. The illness severity was encoded based on the respiratory rate, oxygen saturation, systolic pressure, and diastolic pressure in five categories: mild, medium, severe, critical, and death. Since 69.23% of participants were in mild severity condition, the results showed the positive effect of Simvastatin on COVID-19 severity for people that take Simvastatin before being infected by the COVID-19 virus. Also, systolic pressure for this case study is 137.31, which is higher than that of the total patients. Another result of this study is that Simvastatin takers have an average of 95.77 mmHg O2Sat; however, the O2Sat is 92.42, which is medium severity for evaluating the entire case study. In the rest of this paper, we used machine learning approaches to diagnose COVID-19 patients’ severity based on clinical features. Results indicated that the decision tree method could predict patients’ illness severity with 87.9% accuracy. Other methods, including the K -nearest neighbors (KNN) algorithm, support vector machine (SVM), Naïve Bayes classifier, and discriminant analysis, showed accuracy levels of 80%, 68.8%, 61.1%, and 85.1%, respectively.


2021 ◽  
Author(s):  
Thiago Abdo ◽  
Fabiano Silva

The purpose of this paper is to analyze the use of different machine learning approaches and algorithms to be integrated as an automated assistance on a tool to aid the creation of new annotated datasets. We evaluate how they scale in an environment without dedicated machine learning hardware. In particular, we study the impact over a dataset with few examples and one that is being constructed. We experiment using deep learning algorithms (Bert) and classical learning algorithms with a lower computational cost (W2V and Glove combined with RF and SVM). Our experiments show that deep learning algorithms have a performance advantage over classical techniques. However, deep learning algorithms have a high computational cost, making them inadequate to an environment with reduced hardware resources. Simulations using Active and Iterative machine learning techniques to assist the creation of new datasets are conducted. For these simulations, we use the classical learning algorithms because of their computational cost. The knowledge gathered with our experimental evaluation aims to support the creation of a tool for building new text datasets.


2018 ◽  
Vol 7 (2) ◽  
pp. 917
Author(s):  
S Venkata Suryanarayana ◽  
G N. Balaji ◽  
G Venkateswara Rao

With the extensive use of credit cards, fraud appears as a major issue in the credit card business. It is hard to have some figures on the impact of fraud, since companies and banks do not like to disclose the amount of losses due to frauds. At the same time, public data are scarcely available for confidentiality issues, leaving unanswered many questions about what is the best strategy. Another problem in credit-card fraud loss estimation is that we can measure the loss of only those frauds that have been detected, and it is not possible to assess the size of unreported/undetected frauds. Fraud patterns are changing rapidly where fraud detection needs to be re-evaluated from a reactive to a proactive approach. In recent years, machine learning has gained lot of popularity in image analysis, natural language processing and speech recognition. In this regard, implementation of efficient fraud detection algorithms using machine-learning techniques is key for reducing these losses, and to assist fraud investigators. In this paper logistic regression, based machine learning approach is utilized to detect credit card fraud. The results show logistic regression based approaches outperforms with the highest accuracy and it can be effectively used for fraud investigators.  


2020 ◽  
Vol 15 (1) ◽  
Author(s):  
Julie Chih-yu Chen ◽  
Andrea D. Tyler

Abstract Background The advent of metagenomic sequencing provides microbial abundance patterns that can be leveraged for sample origin prediction. Supervised machine learning classification approaches have been reported to predict sample origin accurately when the origin has been previously sampled. Using metagenomic datasets provided by the 2019 CAMDA challenge, we evaluated the influence of variable technical, analytical and machine learning approaches for result interpretation and novel source prediction. Results Comparison between 16S rRNA amplicon and shotgun sequencing approaches as well as metagenomic analytical tools showed differences in normalized microbial abundance, especially for organisms present at low abundance. Shotgun sequence data analyzed using Kraken2 and Bracken, for taxonomic annotation, had higher detection sensitivity. As classification models are limited to labeling pre-trained origins, we took an alternative approach using Lasso-regularized multivariate regression to predict geographic coordinates for comparison. In both models, the prediction errors were much higher in Leave-1-city-out than in 10-fold cross validation, of which the former realistically forecasted the increased difficulty in accurately predicting samples from new origins. This challenge was further confirmed when applying the model to a set of samples obtained from new origins. Overall, the prediction performance of the regression and classification models, as measured by mean squared error, were comparable on mystery samples. Due to higher prediction error rates for samples from new origins, we provided an additional strategy based on prediction ambiguity to infer whether a sample is from a new origin. Lastly, we report increased prediction error when data from different sequencing protocols were included as training data. Conclusions Herein, we highlight the capacity of predicting sample origin accurately with pre-trained origins and the challenge of predicting new origins through both regression and classification models. Overall, this work provides a summary of the impact of sequencing technique, protocol, taxonomic analytical approaches, and machine learning approaches on the use of metagenomics for prediction of sample origin.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7075 ◽  
Author(s):  
Carlos Fernandez-Lozano ◽  
Adrian Carballal ◽  
Penousal Machado ◽  
Antonino Santos ◽  
Juan Romero

Humans’ perception of visual complexity is often regarded as one of the key principles of aesthetic order, and is intimately related to the physiological, neurological and, possibly, psychological characteristics of the human mind. For these reasons, creating accurate computational models of visual complexity is a demanding task. Building upon on previous work in the field (Forsythe et al., 2011; Machado et al., 2015) we explore the use of Machine Learning techniques to create computational models of visual complexity. For that purpose, we use a dataset composed of 800 visual stimuli divided into five categories, describing each stimulus by 329 features based on edge detection, compression error and Zipf’s law. In an initial stage, a comparative analysis of representative state-of-the-art Machine Learning approaches is performed. Subsequently, we conduct an exhaustive outlier analysis. We analyze the impact of removing the extreme outliers, concluding that Feature Selection Multiple Kernel Learning obtains the best results, yielding an average correlation to humans’ perception of complexity of 0.71 with only twenty-two features. These results outperform the current state-of-the-art, showing the potential of this technique for regression.


Author(s):  
Marko Pregeljc ◽  
Erik Štrumbelj ◽  
Miran Mihelcic ◽  
Igor Kononenko

The authors employed traditional and novel machine learning to improve insight into the connections between the quality of an organization of enterprises as a type of formal social units and the results of enterprises’ performance in this chapter. The analyzed data set contains 72 Slovenian enterprises’ economic results across four years and indicators of their organizational quality. The authors hypothesize that a causal relationship exists between the latter and the former. In the first part of a two-part process, they use several classification algorithms to study these relationships and to evaluate how accurately they predict the target economic results. However, the most successful models were often very complex and difficult to interpret, especially for non-technical users. Therefore, in the second part, the authors take advantage of a novel general explanation method that can be used to explain the influence of individual features on the model’s prediction. Results show that traditional machine-learning approaches are successful at modeling the dependency relationship. Furthermore, the explanation of the influence of the input features on the predicted economic results provides insights that have a meaningful economic interpretation.


Sign in / Sign up

Export Citation Format

Share Document