scholarly journals DIGITAL CYBER FORENSIC EMAIL ANALYSIS AND DETECTION BASED ON INTELLIGENT TECHNIQUESINVESTIGATION

2020 ◽  
Vol 3 (1) ◽  
pp. 11-25
Author(s):  
Sally Dakheel Hamdi ◽  
Abdulkareem M. Radhi

The Internet has become open, public and widely used as a source of data transmission and exchanging messages between criminals, terrorists and those who have illegal motivations.  Moreover, it can be used for exchanging important data between various military and financial institutions, or even ordinary citizens. One of the important means of exchanging information widely used on the Internet medium is the e-mail. Email messages are digital evidence that has been become one of the important means to adopt by courts in many countries and societies as evidence relied upon in condemnation, that prompts the researchers to work continuously to develop email analysis tool using the latest technologies to find digital evidence from email messages to assist the forensic expertise into to analyze email groups .This work presents a distinct technique for analyzing and classifying emails based on data processing and extraction, trimming, and refinement, clustering, then using the SWARM algorithm to improve the performance and then adapting support vector machine algorithm to classify these emails to obtain practical and accurate results. This framework, also proposes a hybrid English lexical Dictionary (SentiWordNet 3.0) for email forensic analysis, it contains all the sentiwords such as positive and negative and can deal with the Machine Learning algorithm. The proposed system is capable of learning in an environment with large and variable data. To test the proposed system will be select available data which is Enron Data set. A high accuracy rate is 92% was obtained in best case. The experiment is conducted the Enron email dataset corpus (May 7, 2015 Version of the dataset).

2012 ◽  
Vol 268-270 ◽  
pp. 1844-1848
Author(s):  
Mu Hee Song

Due to the distribution of personal computers and the internet, E-mail has become one of the most widely used communicative means. However, a massive amount of spam mail is polluting mailboxes everyday, taking advantage of the ability to send mail to any number of random people through the internet. In this paper we will introduce an efficient method of classifying E-mails using the SVM(Support Vector Machine) learning algorithm, which is recently showing high performance in the field of classifying documents. The disposition of the words inside the E-mail documents are extracted, and the performance of classification is compared and examined through the learning based on the change of DF value which occurs to reduce the disposition space in the learning level. To assess the performance of the SVM, the SVM is compared to the Naïve Bayes classifier (which uses probability methods) and a vector model classifier in order to verify that the method of using the learning algorithm of SVM shows better performance.


2018 ◽  
Vol 34 (3) ◽  
pp. 569-581 ◽  
Author(s):  
Sujata Rani ◽  
Parteek Kumar

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.


A large volume of datasets is available in various fields that are stored to be somewhere which is called big data. Big Data healthcare has clinical data set of every patient records in huge amount and they are maintained by Electronic Health Records (EHR). More than 80 % of clinical data is the unstructured format and reposit in hundreds of forms. The challenges and demand for data storage, analysis is to handling large datasets in terms of efficiency and scalability. Hadoop Map reduces framework uses big data to store and operate any kinds of data speedily. It is not solely meant for storage system however conjointly a platform for information storage moreover as processing. It is scalable and fault-tolerant to the systems. Also, the prediction of the data sets is handled by machine learning algorithm. This work focuses on the Extreme Machine Learning algorithm (ELM) that can utilize the optimized way of finding a solution to find disease risk prediction by combining ELM with Cuckoo Search optimization-based Support Vector Machine (CS-SVM). The proposed work also considers the scalability and accuracy of big data models, thus the proposed algorithm greatly achieves the computing work and got good results in performance of both veracity and efficiency.


2019 ◽  
Vol 8 (07) ◽  
pp. 24680-24782
Author(s):  
Manisha Bagri ◽  
Neha Aggarwal

By 2020 around 25-50 billion devices are likely to be connected to the internet. Due to this new development, it gives rise to something called Internet of Things (IoT). The interconnected devices can generate and share data over a network. Machine Learning plays a key role in IoT to handle the vast amount of data. It gives IoT and devices a brain to think, which is often called as intelligence. The data can be feed to machines for learning patterns, based on training the machines can identify to predict for the future. This paper gives a brief explanation of IoT. This paper gives a crisp explanation of machine learning algorithm and its types. However, Support Vector Machine (SVM) is explained in details along with its merits and demerits. An algorithm is also proposed for weather prediction using SVM for IoT.


GEOMATICA ◽  
2021 ◽  
pp. 1-23
Author(s):  
Roholah Yazdan ◽  
Masood Varshosaz ◽  
Saied Pirasteh ◽  
Fabio Remondino

Automatic detection and recognition of traffic signs from images is an important topic in many applications. At first, we segmented the images using a classification algorithm to delineate the areas where the signs are more likely to be found. In this regard, shadows, objects having similar colours, and extreme illumination changes can significantly affect the segmentation results. We propose a new shape-based algorithm to improve the accuracy of the segmentation. The algorithm works by incorporating the sign geometry to filter out the wrong pixels from the classification results. We performed several tests to compare the performance of our algorithm against those obtained by popular techniques such as Support Vector Machine (SVM), K-Means, and K-Nearest Neighbours. In these tests, to overcome the unwanted illumination effects, the images are transformed into colour spaces Hue, Saturation, and Intensity, YUV, normalized red green blue, and Gaussian. Among the traditional techniques used in this study, the best results were obtained with SVM applied to the images transformed into the Gaussian colour space. The comparison results also suggested that by adding the geometric constraints proposed in this study, the quality of sign image segmentation is improved by 10%–25%. We also comparted the SVM classifier enhanced by incorporating the geometry of signs with a U-Shaped deep learning algorithm. Results suggested the performance of both techniques is very close. Perhaps the deep learning results could be improved if a more comprehensive data set is provided.


2021 ◽  
Vol 3 (3) ◽  
pp. 63-72
Author(s):  
Wanjun Zhao ◽  

Background: We aimed to establish a novel diagnostic model for kidney diseases by combining artificial intelligence with complete mass spectrum information from urinary proteomics. Methods: We enrolled 134 patients (IgA nephropathy, membranous nephropathy, and diabetic kidney disease) and 68 healthy participants as controls, with a total of 610,102 mass spectra from their urinary proteomic profiles. The training data set (80%) was used to create a diagnostic model using XGBoost, random forest (RF), a support vector machine (SVM), and artificial neural networks (ANNs). The diagnostic accuracy was evaluated using a confusion matrix with a test dataset (20%). We also constructed receiver operating-characteristic, Lorenz, and gain curves to evaluate the diagnostic model. Results: Compared with the RF, SVM, and ANNs, the modified XGBoost model, called Kidney Disease Classifier (KDClassifier), showed the best performance. The accuracy of the XGBoost diagnostic model was 96.03%. The area under the curve of the extreme gradient boosting (XGBoost) model was 0.952 (95% confidence interval, 0.9307–0.9733). The Kolmogorov-Smirnov (KS) value of the Lorenz curve was 0.8514. The Lorenz and gain curves showed the strong robustness of the developed model. Conclusions: The KDClassifier achieved high accuracy and robustness and thus provides a potential tool for the classification of kidney diseases


Author(s):  
R. Gopinath ◽  
C. Santhosh Kumar ◽  
K. Vishnuprasad ◽  
K. I. Ramachandran

Support vector machine (SVM) is a popular machine learning algorithm used extensively in machine fault diagnosis. In this paper, linear, radial basis function (RBF), polynomial, and sigmoid kernels are experimented to diagnose inter-turn faults in a 3kVA synchronous generator. From the preliminary results, it is observed that the performance of the baseline systemis not satisfactory since the statistical features are nonlinear and does not match to the kernels used. In this work, the features are linearized to a higher dimensional space to improve the performance of fault diagnosis system for a synchronous generator using feature mapping techniques, sparse coding and locality constrained linear coding (LLC). Experiments and results show that LLC is superior to sparse coding for improving the performance of fault diagnosis of a synchronous generator. For the balanced data set, LLC improves the overall fault identification accuracy of the baseline RBF system by 22.56%, 18.43% and 17.05% for the R, Y and Bphase faults respectively.


2012 ◽  
Vol 461 ◽  
pp. 818-821
Author(s):  
Shi Hu Zhang

The problem of real estate prices are the current focus of the community's concern. Support Vector Machine is a new machine learning algorithm, as its excellent performance of the study, and in small samples to identify many ways, and so has its unique advantages, is now used in many areas. Determination of real estate price is a complicated problem due to its non-linearity and the small quantity of training data. In this study, support vector machine (SVM) is proposed to forecast the price of real estate price in China. The experimental results indicate that the SVM method can achieve greater accuracy than grey model, artificial neural network under the circumstance of small training data. It was also found that the predictive ability of the SVM outperformed those of some traditional pattern recognition methods for the data set used here.


2014 ◽  
Vol 536-537 ◽  
pp. 578-582
Author(s):  
Shu Hui Chang ◽  
Gui Fa Teng ◽  
Jian Bin Ma

E-mail has become one of the most important applications on the Internet. At the same time, computer crimes involving e-mail increases rapidly. To prevent these phenomena from happening, the authorship identification methods for Chinese e-mail documents were described in this paper, which could provide evidence for the purpose of computer forensic. E-mail form features to classify authorship were extracted. To classify the author of Chinese e-mail, the SVM(support vector machine) algorithm was adopted to learn the authors features. Experiments gained satisfactory results on limited dataset. The accuracy of dataset for four authors was 92.56%. The satisfactory results showed that it was feasible to apply to computer forensic.


2006 ◽  
Vol 27 (5) ◽  
pp. 587-608 ◽  
Author(s):  
Noelle Chesley

This study analyzes a couple-level ( N = 581), longitudinal data set of employees to provide evidence about technology use over time, the factors that predict use, and the potential for a spouse to influence an individual's use. Although longitudinal usage patterns suggest a trend toward adoption and use of e-mail, the Internet, cell phones, and pagers over time, this trend toward continuing use is stronger for some technologies (e-mail, the Internet) than for others (cell phones, pagers). Furthermore, correlates of use differ by gender and the type of technology used. Last, technology use tends to be an individual-rather than couple-level phenomenon, with one exception. In the case of cell phone or pager use, husbands’ past use influences wives’ use 2 years later.


Sign in / Sign up

Export Citation Format

Share Document