A proposed data science approach for email spam classification using machine learning techniques

Author(s):  
Aakash Atul Alurkar ◽  
Sourabh Bharat Ranade ◽  
Shreeya Vijay Joshi ◽  
Siddhesh Sanjay Ranade ◽  
Piyush A. Sonewar ◽  
...  
Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


Author(s):  
P. Priakanth ◽  
S. Gopikrishnan

The idea of an intelligent, independent learning machine has fascinated humans for decades. The philosophy behind machine learning is to automate the creation of analytical models in order to enable algorithms to learn continuously with the help of available data. Since IoT will be among the major sources of new data, data science will make a great contribution to make IoT applications more intelligent. Machine learning can be applied in cases where the desired outcome is known (guided learning) or the data is not known beforehand (unguided learning) or the learning is the result of interaction between a model and the environment (reinforcement learning). This chapter answers the questions: How could machine learning algorithms be applied to IoT smart data? What is the taxonomy of machine learning algorithms that can be adopted in IoT? And what are IoT data characteristics in real-world which requires data analytics?


Author(s):  
P. Priakanth ◽  
S. Gopikrishnan

The idea of an intelligent, independent learning machine has fascinated humans for decades. The philosophy behind machine learning is to automate the creation of analytical models in order to enable algorithms to learn continuously with the help of available data. Since IoT will be among the major sources of new data, data science will make a great contribution to make IoT applications more intelligent. Machine learning can be applied in cases where the desired outcome is known (guided learning) or the data is not known beforehand (unguided learning) or the learning is the result of interaction between a model and the environment (reinforcement learning). This chapter answers the questions: How could machine learning algorithms be applied to IoT smart data? What is the taxonomy of machine learning algorithms that can be adopted in IoT? And what are IoT data characteristics in real-world which requires data analytics?


Author(s):  
RajKishore Sahni

The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter spam emails. We present a systematic review of some of the popular machine learning based email spam filtering approaches. Our review covers survey of the important concepts, attempts, efficiency, and the research trend in spam filtering. The preliminary discussion in the study background examines the applications of machine learning techniques to the email spam filtering process of the leading internet service providers (ISPs) like Gmail, Yahoo and Outlook emails spam filters. Discussion on general email spam filtering process, and the various efforts by different researchers in combating spam through the use machine learning techniques was done. Our review compares the strengths and drawbacks of existing machine learning approaches and the open research problems in spam filtering. We recommended deep learning and deep adversarial learning as the future techniques that can effectively handle the menace of spam emails


Author(s):  
Jonathan M. Gumley ◽  
Hayden Marcollo ◽  
Stuart Wales ◽  
Andrew E. Potts ◽  
Christopher J. Carra

Abstract There is growing importance in the offshore floating production sector to develop reliable and robust means of continuously monitoring the integrity of mooring systems for FPSOs and FPUs, particularly in light of the upcoming introduction of API-RP-2MIM. Here, the limitations of the current range of monitoring techniques are discussed, including well established technologies such as load cells, sonar, or visual inspection, within the context of the growing mainstream acceptance of data science and machine learning. Due to the large fleet of floating production platforms currently in service, there is a need for a readily deployable solution that can be retrofitted to existing platforms to passively monitor the performance of floating assets on their moorings, for which machine learning based systems have particular advantages. An earlier investigation conducted in 2016 on a shallow water, single point moored FPSO employed host facility data from in-service field measurements before and after a single mooring line failure event. This paper presents how the same machine learning techniques were applied to a deep water, semi taut, spread moored system where there was no host facility data available, therefore requiring a calibrated hydrodynamic numerical model to be used as the basis for the training data set. The machine learning techniques applied to both real and synthetically generated data were successful in replicating the response of the original system, even with the latter subjected to different variations of artificial noise. Furthermore, utilizing a probability-based approach, it was demonstrated that replicating the response of the underlying system was a powerful technique for predicting changes in the mooring system.


2022 ◽  
pp. 209-232
Author(s):  
Xiang Li ◽  
Jingxi Liao ◽  
Tianchuan Gao

Machine learning is a broad field that contains multiple fields of discipline including mathematics, computer science, and data science. Some of the concepts, like deep neural networks, can be complicated and difficult to explain in several words. This chapter focuses on essential methods like classification from supervised learning, clustering, and dimensionality reduction that can be easily interpreted and explained in an acceptable way for beginners. In this chapter, data for Airbnb (Air Bed and Breakfast) listings in London are used as the source data to study the effect of each machine learning technique. By using the K-means clustering, principal component analysis (PCA), random forest, and other methods to help build classification models from the features, it is able to predict the classification results and provide some performance measurements to test the model.


Author(s):  
Omar Farooq ◽  
Parminder Singh

Introduction: The emergence of the concepts like Big Data, Data Science, Machine Learning (ML), and the Internet of Things (IoT) has added the potential of research in today's world. The continuous use of IoT devices, sensors, etc. that collect data continuously puts tremendous pressure on the existing IoT network. Materials and Methods: This resource-constrained IoT environment is flooded with data acquired from millions of IoT nodes deployed at the device level. The limited resources of the IoT Network have driven the researchers towards data Management. This paper focuses on data classification at the device level, edge/fog level, and cloud level using machine learning techniques. Results: The data coming from different devices is vast and is of variety. Therefore, it becomes essential to choose the right approach for classification and analysis. It will help optimize the data at the device edge/fog level to better the network's performance in the future. Conclusion: This paper presents data classification, machine learning approaches, and a proposed mathematical model for the IoT environment.


Author(s):  
Aakash Atul Alurkar ◽  
Sourabh Bharat Ranade ◽  
Shreeya Vijay Joshi ◽  
Siddhesh Sanjay Ranade ◽  
Gitanjali R. Shinde ◽  
...  

2018 ◽  
Vol 47 (6) ◽  
pp. 1081-1097 ◽  
Author(s):  
Jeremy Auerbach ◽  
Christopher Blackburn ◽  
Hayley Barton ◽  
Amanda Meng ◽  
Ellen Zegura

We estimate the cost and impact of a proposed anti-displacement program in the Westside of Atlanta (GA) with data science and machine learning techniques. This program intends to fully subsidize property tax increases for eligible residents of neighborhoods where there are two major urban renewal projects underway, a stadium and a multi-use trail. We first estimate household-level income eligibility for the program with data science and machine learning approaches applied to publicly available household-level data. We then forecast future property appreciation due to urban renewal projects using random forests with historic tax assessment data. Combining these projections with household-level eligibility, we estimate the costs of the program for different eligibility scenarios. We find that our household-level data and machine learning techniques result in fewer eligible homeowners but significantly larger program costs, due to higher property appreciation rates than the original analysis, which was based on census and city-level data. Our methods have limitations, namely incomplete data sets, the accuracy of representative income samples, the availability of characteristic training set data for the property tax appreciation model, and challenges in validating the model results. The eligibility estimates and property appreciation forecasts we generated were also incorporated into an interactive tool for residents to determine program eligibility and view their expected increases in home values. Community residents have been involved with this work and provided greater transparency, accountability, and impact of the proposed program. Data collected from residents can also correct and update the information, which would increase the accuracy of the program estimates and validate the modeling, leading to a novel application of community-driven data science.


Sign in / Sign up

Export Citation Format

Share Document