scholarly journals Email Spam Detection using Ensemble Methods

2019 ◽  
Vol 8 (3) ◽  
pp. 4148-4153

The swiftly growth of spam email has escalated the need to upgrade the existing spam detection and filtration methods. There is the existence of several machine learning methods for the classification and detection of email spam but these lacks in some cases. In this research work ensemble methods are adapted to detect the email spam. The machine learning methods of Multinomial Naïve Bayes and J48 Decision Tree algorithms are considered and ensembled. The considered ensemble methods are bagging and boosting. The experimentation is conducted on the dataset of CSDMC2010 Spam corpus. The results for the considered dataset are evaluated using individual classifiers, bagging, and boosting ensemble approaches. The system performance is accessed in terms of precision, recall, f-measure, and accuracy. The experimental outcomes indicates the distinguish results for the detection of email spam using ensemble methods.

2019 ◽  
Author(s):  
Leila Mirsadeghi ◽  
Ali Mohammad Banaei-Moghaddam ◽  
Seyed Reza Beh-Afarin ◽  
Reza Haji Hosseini ◽  
Kaveh Kavousi

Abstract Background: Ensemble methods are supervised learning approaches that integrate different types of data or multiple individual classifiers. It has been shown that these methods can improve professional performance.Methods: This study is an attempt to provide an in-depth review on 45 most relevant articles and aims to introduce 42 ensemble classifier (EC) machine learning methods used for the detection of 18 different types of cancer. Compared to other types of cancer, breast cancer, and the 22 ensemble methods introduced for its identification, is extensively investigated. The purpose of this study is to identify, map, and analyze the current academic discourse on EC machine learning methods in order to: 1. identify overarching themes emerging from empirical studies as regards EC methods, 2. determine their input data and decision-making strategies, and 3. evaluate relevant statistical procedures.Results: By comparing various approaches, we can introduce Relevance Vector Machine (RVM)-based ensemble learning method that can provide optimal solutions for problems such as curse the dimensionality and high-dimensionality of feature space without missing data values.Conclusions: To obtain robust performance and achieve better results, it is tactfully suggested to use multi-omics data integration, which has demonstrated to identify cancers and their subtypes more efficiently.


2019 ◽  
Author(s):  
Kaveh Kavousi ◽  
Leila Mirsadeghi ◽  
Reza Haji Hosseini ◽  
Ali Mohammad Banaei-Moghaddam ◽  
Seyed Reza Beh-Afarin

Abstract Background Ensemble methods are supervised learning approaches that integrate different types of data or multiple individual classifiers. It has been shown that these methods can improve professional performance. Methods This study is an attempt to provide an in-depth review on 45 most relevant articles and aims to introduce 42 ensemble classifier (EC) machine learning methods used for the detection of 18 different types of cancer. Compared to other types of cancer, breast cancer, and the 22 ensemble methods introduced for its identification, is extensively investigated. The purpose of this study was to identify, map, and analyze the current academic discourse on EC machine learning methods in order to: 1. identify overarching themes emerging from empirical studies regarding EC methods, 2. determine their input data and decision-making strategies, and 3. evaluate relevant statistical procedures. Results By comparing various approaches, we can introduce Relevance Vector Machine (RVM)-based ensemble learning method that can provide optimal solutions for problems such as curse the dimensionality and high-dimensionality of feature space without missing data values. Conclusions To obtain robust performance and achieve better results, it is tactfully suggested to use multi-omics data integration, which has demonstrated to identify cancers and their subtypes more efficiently.


2020 ◽  
Vol 13 (11) ◽  
pp. 265
Author(s):  
Hector F. Calvo-Pardo ◽  
Tullio Mancini ◽  
Jose Olmo

This paper presents an overview of the procedures that are involved in prediction with machine learning models with special emphasis on deep learning. We study suitable objective functions for prediction in high-dimensional settings and discuss the role of regularization methods in order to alleviate the problem of overfitting. We also review other features of machine learning methods, such as the selection of hyperparameters, the role of the architecture of a deep neural network for model prediction, or the importance of using different optimization routines for model selection. The review also considers the issue of model uncertainty and presents state-of-the-art methods for constructing prediction intervals using ensemble methods, such as bootstrap and Monte Carlo dropout. These methods are illustrated in an out-of-sample empirical forecasting exercise that compares the performance of machine learning methods against conventional time series models for different financial indices. These results are confirmed in an asset allocation context.


Now a day, product ratings are very much essential for the product available online so that customers can view a product's actual rating before they are going to buy it. This is only the primary source of information for a product, and it is also essential for manufacturers, retailers to improve product quality in terms of production and sale.A rating can make it easy for consumers to figure out how much they enjoy the product. Now in case of new arrival products which have not been used by any customers or any users, the ratings not available online. We have tried to find ratings for new arrival products in this research work by identifying the quality of that product, which will assist customers before buying it. We have also examined different method that will predict the rating of the newest arrival product based on product features, description, information that are available on the e-commerce platform like Amazon, Flipchart. To achieve the defined goal, we have worked on existing data that are available for products already arrived in the market and already used by a customer. The main objective of this research is to help the customer who is going to purchase new arrival products. This is done by comparing different existing Machine Learning methods with the help of the existing data set. We have tried to find out the best method among the existing Machine learning methods and applied that method to predict the rating of the newest arrival product based on the available features.


2019 ◽  
Vol 19 (292) ◽  
Author(s):  
Nan Hu ◽  
Jian Li ◽  
Alexis Meyer-Cirkel

We compared the predictive performance of a series of machine learning and traditional methods for monthly CDS spreads, using firms’ accounting-based, market-based and macroeconomics variables for a time period of 2006 to 2016. We find that ensemble machine learning methods (Bagging, Gradient Boosting and Random Forest) strongly outperform other estimators, and Bagging particularly stands out in terms of accuracy. Traditional credit risk models using OLS techniques have the lowest out-of-sample prediction accuracy. The results suggest that the non-linear machine learning methods, especially the ensemble methods, add considerable value to existent credit risk prediction accuracy and enable CDS shadow pricing for companies missing those securities.


2019 ◽  
Author(s):  
Kaveh Kavousi ◽  
Leila Mirsadeghi ◽  
Reza Haji Hosseini ◽  
Ali Mohammad Banaei-Moghaddam ◽  
Seyed Reza Beh-Afarin

Abstract Background Ensemble methods are supervised learning approaches that integrate different types of data or multiple individual classifiers. It has been shown that these methods can improve professional performance. Methods This study is an attempt to provide an in-depth review on 45 most relevant articles and aims to introduce 42 ensemble classifier (EC) machine learning methods used for the detection of 18 different types of cancer. Compared to other types of cancer, breast cancer, and the 22 ensemble methods introduced for its identification, is extensively investigated. The purpose of this study was to identify, map, and analyze the current academic discourse on EC machine learning methods in order to: 1. identify overarching themes emerging from empirical studies regarding EC methods, 2. determine their input data and decision-making strategies, and 3. evaluate relevant statistical procedures. Results By comparing various approaches, we can introduce Relevance Vector Machine (RVM)-based ensemble learning method that can provide optimal solutions for problems such as curse the dimensionality and high-dimensionality of feature space without missing data values. Conclusions To obtain robust performance and achieve better results, it is tactfully suggested to use multi-omics data integration, which has demonstrated to identify cancers and their subtypes more efficiently.


Symmetry ◽  
2019 ◽  
Vol 11 (8) ◽  
pp. 957 ◽  
Author(s):  
Qingqing Zhou ◽  
Guo Chen ◽  
Wenjun Jiang ◽  
Kenli Li ◽  
Keqin Li

Excavators are one of the most frequently used pieces of equipment in large-scale construction projects. They are closely related to the construction speed and total cost of the entire project. Therefore, it is very important to effectively monitor their operating status and detect abnormal conditions. Previous research work was mainly based on expert systems and traditional statistical models to detect excavator anomalies. However, these methods are not particularly suitable for modern sophisticated excavators. In this paper, we take the first step and explore the use of machine learning methods to automatically detect excavator anomalies by mining its working condition data collected from multiple sensors. The excavators we studied are from Sany Group, the largest construction machinery manufacturer in China. We have collected 40 days working condition data of 107 excavators from Sany. In addition, we worked with six excavator operators and engineers for more than a month to clean the original data and mark the anomalous samples. Based on the processed data, we have designed three anomaly detection schemes based on machine learning methods, using support vector machine (SVM), back propagation (BP) neural network and decision tree algorithms, respectively. Based on the real excavator data, we have carried out a comprehensive evaluation. The results show that the anomaly detection accuracy is as high as 99.88%, which is obviously superior to the previous methods based on expert systems and traditional statistical models.


Author(s):  
Mazhar Ali ◽  
Asim Imdad Wagan

The linguistic corpus of Sindhi language is significant for computational linguistics process, machine learning process, language features identification and analysis, semantic and sentiment analysis, information retrieval and so on. There is little computational linguistics work done on Sindhi text whereas, English, Arabic, Urdu and some other languages are fully resourced computationally. The grammar and morphemes of these languages are analyzed properly using dissimilar machine learning methods. The development and research work regarding computational linguistics are in progress on Sindhi language at this time. This study is planned to develop the Sindhi annotated corpus using universal POS (Part of Speech) tag set and Sindhi POS tag set for the purpose of language features and variation analysis. The features are extracted using TF-IDF (Term Frequency and Inverse Document Frequency) technique. The supervised machine learning model is developed to assess the annotated corpus to know the grammatical annotation of Sindhi language. The model is trained with 80% of annotated corpus and tested with 20% of test set. The cross-validation technique with 10-folds is utilized to evaluate and validate the model. The results of model show the better performance of model as well as confirm the proper annotation to Sindhi corpus. This study described a number of research gaps to work more on topic modeling, language variation, sentiment and semantic analysis of Sindhi language.


2021 ◽  
Author(s):  
Bo Shi ◽  
Hui Su ◽  
Xu Du ◽  
Bao Jiao ◽  
Lin Wang

With the rapid development of underground engineering in China, more metro tunnels are being constructed, the mileage of subway tunnels is increasing, and the corresponding problems of tunnel structure diseases are becoming more prominent. At present, the treatment of tunnel structural diseases mainly relies on manual inspection and identification, and research on defects prediction is still lacking. Because of the complexity of the factors affecting tunnel structure diseases, it is difficult to analyze the causes and development trend of the diseases comprehensively by manual analysis. Fortunately, machine learning methods have gained popularity in classification and regression tasks in recent decades. Many algorithms, such as decision tree algorithms, the random forest algorithm, and XGBoost, have been applied in fields including finance, engineering, and transportation. This study aimed to analyze the prediction effect of machine learning models by feeding 68055 segment lining rings of six subway lines in a city. According to the disease records from 2014 to 2016 and the corresponding convergence and characteristic data, defect conditions in 2017 were predicted and compared with real defect conditions in 2017. The accuracy rates and F1 values of the predicted results were all above 80%. The prediction results can help tunnel maintenance departments and relevant government regulators make auxiliary decisions to control tunnel structure diseases, and can help them focus on the tunnel interval of severe diseases to clarify the development trend of tunnel disease.


Sign in / Sign up

Export Citation Format

Share Document