Towards automated preprocessing of bulk data in digital forensic investigations using hash functions

2015 ◽  
Vol 57 (6) ◽  
Author(s):  
Harald Baier

AbstractHandling bulk data (e. g. some terabytes of data) is a issue in contemporary digital forensics. Separating relevant data structures from irrelevant ones resembles finding the needle in the haystack. The article at hand presents and assesses automatic hash-based techniques to preprocess the input data with the goal to facilitate the investigator's job. We discuss concepts like blacklisting and whitelisting based on cryptographic hash functions and approximate matching, respectively. In case of two established process models for a lab and an on-site investigation, respectively, we describe how to jointly use these techniques to automatically get a pointer to the needle.

2019 ◽  
Author(s):  
Vitor Hugo Moia ◽  
Frank Breitinger ◽  
Marco Aurélio Henriques

Finding similarity in digital forensics investigations can be assisted with the use of Approximate Matching (AM) functions. These algorithms create small and compact representations of objects (similar to hashes) which can be compared to identify similarity. However, often results are biased due to common blocks (data structures found in many different files regardless of content). In this paper, we evaluate the precision and recall metrics for AM functions when removing common blocks. In detail, we analyze how the similarity score changes and impacts different investigation scenarios. Results show that many irrelevant matches can be filtered out and that a new interpretation of the score allows a better similarity detection.


2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Hasan Binjuraid ◽  
Mazura Mat Din

With the advancement in computer technologies, cybercrimes advanced too. As in today’s world, the technology knowledge to attack a computer is less than ever, with the help of advanced tools that does most of the work. Digital forensic investigations are crucial in solving this type of crimes, and it must be done professionally. Computer registries play a big part in the digital forensic investigation, it can help find artifacts that are left by the cybercrimes, dates of the crimes on the computer system and the user at the time of the crime. In this research, interpretation of these artifacts is the main focus, committees and jurors are the focus of the interpretations of the registries. Two types of cases are subject to investigation in this research. BitTorrent clients’ use for downloading illegal o copyrighted content, and three clients are chosen for this digital forensic investigation uTorrent, Vuze and BitComet. Theft using USB storage devices is the second type of case, where there are three types of USB devices Mass Storage Class, Picture Transfer Protocol and Media Transfer Protocol, each type of USB devices leaves different artifacts behind during insertion and removal. A web based dashboard will be developed to help with the process of interpretation the artifacts found in the registry of the computer system. A categorization process of each cybercrime case will be conduct to evaluate the severity of the case depending on the artifacts found in the digital forensics investigation process. The research methodology will consist of three phases. The first phase will be information gathering including literature review, requirements gathering and dataset gathering for the research. Performing digital forensics analysis will be the second phase and it includes planning, identification and reconnaissance. Last phase will include result analysis and discussion.


2020 ◽  
Vol 21 (2) ◽  
pp. 257-290
Author(s):  
R.I. Ferguson ◽  
Karen Renaud ◽  
Sara Wilford ◽  
Alastair Irons

PurposeCyber-enabled crimes are on the increase, and law enforcement has had to expand many of their detecting activities into the digital domain. As such, the field of digital forensics has become far more sophisticated over the years and is now able to uncover even more evidence that can be used to support prosecution of cyber criminals in a court of law. Governments, too, have embraced the ability to track suspicious individuals in the online world. Forensics investigators are driven to gather data exhaustively, being under pressure to provide law enforcement with sufficient evidence to secure a conviction.Yet, there are concerns about the ethics and justice of untrammeled investigations on a number of levels. On an organizational level, unconstrained investigations could interfere with, and damage, the organization's right to control the disclosure of their intellectual capital. On an individual level, those being investigated could easily have their legal privacy rights violated by forensics investigations. On a societal level, there might be a sense of injustice at the perceived inequality of current practice in this domain.This paper argues the need for a practical, ethically grounded approach to digital forensic investigations, one that acknowledges and respects the privacy rights of individuals and the intellectual capital disclosure rights of organizations, as well as acknowledging the needs of law enforcement. The paper derives a set of ethical guidelines, and then maps these onto a forensics investigation framework. The framework to expert review in two stages is subjected, refining the framework after each stage. The paper concludes by proposing the refined ethically grounded digital forensics investigation framework. The treatise is primarily UK based, but the concepts presented here have international relevance and applicability.Design/methodology/approachIn this paper, the lens of justice theory is used to explore the tension that exists between the needs of digital forensic investigations into cybercrimes on the one hand, and, on the other, individuals' rights to privacy and organizations' rights to control intellectual capital disclosure.FindingsThe investigation revealed a potential inequality between the practices of digital forensics investigators and the rights of other stakeholders. That being so, the need for a more ethically informed approach to digital forensics investigations, as a remedy, is highlighted and a framework proposed to provide this.Research limitations/implicationsThe proposed ethically informed framework for guiding digital forensics investigations suggests a way of re-establishing the equality of the stakeholders in this arena, and ensuring that the potential for a sense of injustice is reduced.Originality/valueJustice theory is used to highlight the difficulties in squaring the circle between the rights and expectations of all stakeholders in the digital forensics arena. The outcome is the forensics investigation guideline, PRECEpt: Privacy-Respecting EthiCal framEwork, which provides the basis for a re-aligning of the balance between the requirements and expectations of digital forensic investigators on the one hand, and individual and organizational expectations and rights, on the other.


Data ◽  
2021 ◽  
Vol 6 (8) ◽  
pp. 87
Author(s):  
Sara Ferreira ◽  
Mário Antunes ◽  
Manuel E. Correia

Deepfake and manipulated digital photos and videos are being increasingly used in a myriad of cybercrimes. Ransomware, the dissemination of fake news, and digital kidnapping-related crimes are the most recurrent, in which tampered multimedia content has been the primordial disseminating vehicle. Digital forensic analysis tools are being widely used by criminal investigations to automate the identification of digital evidence in seized electronic equipment. The number of files to be processed and the complexity of the crimes under analysis have highlighted the need to employ efficient digital forensics techniques grounded on state-of-the-art technologies. Machine Learning (ML) researchers have been challenged to apply techniques and methods to improve the automatic detection of manipulated multimedia content. However, the implementation of such methods have not yet been massively incorporated into digital forensic tools, mostly due to the lack of realistic and well-structured datasets of photos and videos. The diversity and richness of the datasets are crucial to benchmark the ML models and to evaluate their appropriateness to be applied in real-world digital forensics applications. An example is the development of third-party modules for the widely used Autopsy digital forensic application. This paper presents a dataset obtained by extracting a set of simple features from genuine and manipulated photos and videos, which are part of state-of-the-art existing datasets. The resulting dataset is balanced, and each entry comprises a label and a vector of numeric values corresponding to the features extracted through a Discrete Fourier Transform (DFT). The dataset is available in a GitHub repository, and the total amount of photos and video frames is 40,588 and 12,400, respectively. The dataset was validated and benchmarked with deep learning Convolutional Neural Networks (CNN) and Support Vector Machines (SVM) methods; however, a plethora of other existing ones can be applied. Generically, the results show a better F1-score for CNN when comparing with SVM, both for photos and videos processing. CNN achieved an F1-score of 0.9968 and 0.8415 for photos and videos, respectively. Regarding SVM, the results obtained with 5-fold cross-validation are 0.9953 and 0.7955, respectively, for photos and videos processing. A set of methods written in Python is available for the researchers, namely to preprocess and extract the features from the original photos and videos files and to build the training and testing sets. Additional methods are also available to convert the original PKL files into CSV and TXT, which gives more flexibility for the ML researchers to use the dataset on existing ML frameworks and tools.


2019 ◽  
Vol 11 (7) ◽  
pp. 162 ◽  
Author(s):  
Nikolaos Serketzis ◽  
Vasilios Katos ◽  
Christos Ilioudis ◽  
Dimitrios Baltatzis ◽  
Georgios Pangalos

The complication of information technology and the proliferation of heterogeneous security devices that produce increased volumes of data coupled with the ever-changing threat landscape challenges have an adverse impact on the efficiency of information security controls and digital forensics, as well as incident response approaches. Cyber Threat Intelligence (CTI)and forensic preparedness are the two parts of the so-called managed security services that defendants can employ to repel, mitigate or investigate security incidents. Despite their success, there is no known effort that has combined these two approaches to enhance Digital Forensic Readiness (DFR) and thus decrease the time and cost of incident response and investigation. This paper builds upon and extends a DFR model that utilises actionable CTI to improve the maturity levels of DFR. The effectiveness and applicability of this model are evaluated through a series of experiments that employ malware-related network data simulating real-world attack scenarios. To this extent, the model manages to identify the root causes of information security incidents with high accuracy (90.73%), precision (96.17%) and recall (93.61%), while managing to decrease significantly the volume of data digital forensic investigators need to examine. The contribution of this paper is twofold. First, it indicates that CTI can be employed by digital forensics processes. Second, it demonstrates and evaluates an efficient mechanism that enhances operational DFR.


Sign in / Sign up

Export Citation Format

Share Document