scholarly journals A Framework for the Forensic Investigation of Unstructured Email Relationship Data

2011 ◽  
Vol 3 (3) ◽  
pp. 1-18 ◽  
Author(s):  
John Haggerty ◽  
Alexander J. Karran ◽  
David J. Lamb ◽  
Mark Taylor

The continued reliance on email communications ensures that it remains a major source of evidence during a digital investigation. Emails comprise both structured and unstructured data. Structured data provides qualitative information to the forensics examiner and is typically viewed through existing tools. Unstructured data is more complex as it comprises information associated with social networks, such as relationships within the network, identification of key actors and power relations, and there are currently no standardised tools for its forensic analysis. This paper posits a framework for the forensic investigation of email data. In particular, it focuses on the triage and analysis of unstructured data to identify key actors and relationships within an email network. This paper demonstrates the applicability of the approach by applying relevant stages of the framework to the Enron email corpus. The paper illustrates the advantage of triaging this data to identify (and discount) actors and potential sources of further evidence. It then applies social network analysis techniques to key actors within the data set. This paper posits that visualisation of unstructured data can greatly aid the examiner in their analysis of evidence discovered during an investigation.

2012 ◽  
Vol 2012 ◽  
pp. 1-9 ◽  
Author(s):  
Kai Jiang ◽  
Like Liu ◽  
Rong Xiao ◽  
Nenghai Yu

Recently, many local review websites such as Yelp are emerging, which have greatly facilitated people's daily life such as cuisine hunting. However they failed to meet travelers' demands because travelers are more concerned about a city's local specialties instead of the city's high ranked restaurants. To solve this problem, this paper presents a local specialty mining algorithm, which utilizes both the structured data from local review websites and the unstructured user-generated content (UGC) from community Q&A websites, and travelogues. The proposed algorithm extracts dish names from local review data to build a document for each city, and appliestfidfweighting algorithm on these documents to rank dishes. Dish-city correlations are calculated from unstructured UGC, and combined with thetfidfranking score to discover local specialties. Finally, duplicates in the local specialty mining results are merged. A recommendation service is built to present local specialties to travelers, along with specialties' associated restaurants, Q&A threads, and travelogues. Experiments on a large data set show that the proposed algorithm can achieve a good performance, and compared to using local review data alone, leveraging unstructured UGC can boost the mining performance a lot, especially in large cities.


Author(s):  
Sanjeev Kumar Punia ◽  
Manoj Kumar ◽  
Thompson Stephan ◽  
Ganesh Gopal Deverajan ◽  
Rizwan Patan

In broad, three machine learning classification algorithms are used to discover correlations, hidden patterns, and other useful information from different data sets known as big data. Today, Twitter, Facebook, Instagram, and many other social media networks are used to collect the unstructured data. The conversion of unstructured data into structured data or meaningful information is a very tedious task. The different machine learning classification algorithms are used to convert unstructured data into structured data. In this paper, the authors first collect the unstructured research data from a frequently used social media network (i.e., Twitter) by using a Twitter application program interface (API) stream. Secondly, they implement different machine classification algorithms (supervised, unsupervised, and reinforcement) like decision trees (DT), neural networks (NN), support vector machines (SVM), naive Bayes (NB), linear regression (LR), and k-nearest neighbor (K-NN) from the collected research data set. The comparison of different machine learning classification algorithms is concluded.


Author(s):  
Marcia Kuskin Shamo ◽  
Joachim Meyer ◽  
Daniel Gopher

The issue of optimal display design was re-examined under the hypothesis that the nature of the data set would influence the efficacy of displays. An experiment assessed the effect of data structure on the relative efficacy of line graphs and tables. Participants saw a sequence of graphic or tabular displays which presented data from functions. The displays showed either values taken from the simple form of a single sinusoid function (structured data conditions), or from the complex and seemingly unstructured form of five summed sinusoid functions (unstructured data conditions). After seeing four consecutive displays participants were asked to predict the behavior of the data set in the next display which they had not yet seen. They were required to predict either the behavior of specific points, or the direction of trends. While the performance of the point comparison task was not influenced by display format, graphs were found to have an advantage for the prediction of future trends. Graphs also led more frequently to the identification of structure than did tables, both for structured and unstructured conditions.


2021 ◽  
Vol 13 (2) ◽  
pp. 335-345
Author(s):  
R. Senthilkumar ◽  
B. RubanRaja ◽  
Monisha

A huge corpus of valuable information on customer experience is available as unstructured form in customer reviews on e-commerce websites. Multivariate data analysis techniques are effective in uncovering hidden patterns and segments in structured data. A major challenge is to convert the unstructured data into a structured form for applying multivariate techniques. In this article, we have provided a text analysis based approach coupled with multivariate techniques to uncover the sentiment of various features associated with different brands and to determine the brand positions and segments through perceptual mapping and cluster analysis.


Author(s):  
Nasibah Husna Mohd Kadir ◽  
Sharifah Aliman

In the social media, product reviews contain of text, emoticon, numbers and symbols that hard to identify the text summarization. Text analytics is one of the key techniques in exploring the unstructured data. The purpose of this study is solving the unstructured data by sort and summarizes the review data through a Web-Based Text Analytics using R approach. According to the comparative table between studies in Natural Language Processing (NLP) features, it was observed that Web-Based Text Analytics using R approach can analyze the unstructured data by using the data processing package in R. It combines all the NLP features in the menu part of the text analytics process in steps and it is labeled to make it easier for users to view all the text summarization. This study uses health product review from Shaklee as the data set. The proposed approach shows the acceptable performance in terms of system features execution compared with the baseline model system.


Author(s):  
Nikhith Suvarna

In simple terms, Anti-Forensics can be told as the techniques used to counter forensic analysis done by forensic investigators. This paper mainly focuses on some of the most used anti-forensics techniques along with the challenges the forensics investigator faces. There are many tools and techniques available that when used properly can be highly effective against the forensic analysis techniques. Various tools assist you against various anti-forensics techniques like Elimination of evidence source, Data hiding, and Trail obfuscation. These techniques are used mainly to make the investigation consume more time and money. Sensor Noise Camera Identification is a way to link a photo with the camera the photo was taken from using a noise signature that is unique for every camera. KEYWORDS: Anti-Forensics (AF), Forensic Analysis, Anti-Forensic Techniques, Sensor Noise Camera Identification


2015 ◽  
Vol 49 (1) ◽  
pp. 91-114 ◽  
Author(s):  
Milorad Pantelija Stevic ◽  
Branko Milosavljevic ◽  
Branko Rade Perisic

Purpose – Current e-learning platforms are based on relational database management systems (RDBMS) and are well suited for handling structured data. However, it is expected from e-learning solutions to efficiently handle unstructured data as well. The purpose of this paper is to show an alternative to current solutions for unstructured data management. Design/methodology/approach – Current repository-based solution for file management was compared to MongoDB architecture according to their functionalities and characteristics. This included several categories: data integrity, hardware acquisition, processing files, availability, handling concurrent users, partition tolerance, disaster recovery, backup policies and scalability. Findings – This paper shows that it is possible to improve e-learning platform capabilities by implementing a hybrid database architecture that incorporates RDBMS for handling structured data and MongoDB database system for handling unstructured data. Research limitations/implications – The study shows an acceptable adoption of MongoDB inside a service-oriented architecture (SOA) for enhancing e-learning solutions. Practical implications – This research enables an efficient file handling not only for e-learning systems, but also for any system where file handling is needed. Originality/value – It is expected that future single/joint e-learning initiatives will need to manage huge amount of files and they will require effective file handling solution. The new architecture solution for file handling is offered in this paper: it is different from current solutions because it is less expensive, more efficient, more flexible and requires less administrative and development effort for building and maintaining.


Author(s):  
Sheik Abdullah A. ◽  
Priyadharshini P.

The term Big Data corresponds to a large dataset which is available in different forms of occurrence. In recent years, most of the organizations generate vast amounts of data in different forms which makes the context of volume, variety, velocity, and veracity. Big Data on the volume aspect is based on data set maintenance. The data volume goes to processing usual a database but cannot be handled by a traditional database. Big Data is stored among structured, unstructured, and semi-structured data. Big Data is used for programming, data warehousing, computational frameworks, quantitative aptitude and statistics, and business knowledge. Upon considering the analytics in the Big Data sector, predictive analytics and social media analytics are widely used for determining the pattern or trend which is about to happen. This chapter mainly deals with the tools and techniques that corresponds to big data analytics of various applications.


Big Data ◽  
2016 ◽  
pp. 1495-1518
Author(s):  
Mohammad Alaa Hussain Al-Hamami

Big Data is comprised systems, to remain competitive by techniques emerging due to Big Data. Big Data includes structured data, semi-structured and unstructured. Structured data are those data formatted for use in a database management system. Semi-structured and unstructured data include all types of unformatted data including multimedia and social media content. Among practitioners and applied researchers, the reaction to data available through blogs, Twitter, Facebook, or other social media can be described as a “data rush” promising new insights about consumers' choices and behavior and many other issues. In the past Big Data has been used just by very large organizations, governments and large enterprises that have the ability to create its own infrastructure for hosting and mining large amounts of data. This chapter will show the requirements for the Big Data environments to be protected using the same rigorous security strategies applied to traditional database systems.


Author(s):  
Trupti Vishwambhar Kenekar ◽  
Ajay R. Dani

As Big Data is group of structured, unstructured and semi-structure data collected from various sources, it is important to mine and provide privacy to individual data. Differential Privacy is one the best measure which provides strong privacy guarantee. The chapter proposed differentially private frequent item set mining using map reduce requires less time for privately mining large dataset. The chapter discussed problem of preserving data privacy, different challenges to preserving data privacy in big data environment, Data privacy techniques and their applications to unstructured data. The analyses of experimental results on structured and unstructured data set are also presented.


Sign in / Sign up

Export Citation Format

Share Document