1

Design of a Digital Forensics Evidence Reconstruction System for Complex and Obscure Fragmented File Carving Vrizlynn L. L. Thing, Tong-Wei Chua, and Ming-Lee Cheong Institute for Infocomm Research, Singapore [email protected]

Abstract—Fragmented file carving is an important technique in Digital Forensics to recover files from their fragments in the absence of the file system allocation information. In this paper, we proposed a system design for solving the fragmented file carving problem taking into consideration, conditions of real-life fragmentation scenarios. We developed our evidence reconstruction and recovery system, and carried out experiments, to evaluate the capability in detecting and recovering obscured evidence. The results showed that our system is able to achieve a higher efficiency and accuracy (e.g. 1.5 minutes for the reconstruction of each highly fragmented and deleted (obscured) image in its entirety or 100% recovery), when compared with the commercial recovery system, Adroit Photo Forensics (e.g. 2.8 minutes and 6.3 minutes for a partial image recovery or failure in recovery, respectively).

Keywords: Digital forensics, deleted evidence, fragmentation, file carving. I. I NTRODUCTION

T

HE increasing reliance on digital storage devices such as hard disks and solid state disks for storing important private data and highly confidential information has resulted in a greater need for efficient and accurate data recovery of deleted files during digital forensic investigation. File carving is the technique to recover such deleted files, in the absence of file system allocation information. However, there are often instances where files are fragmented due to low disk space, file deletion and modification. In a recent study [1], FAT was found to be the most popular file system, representing 79.6% of the file systems analyzed. From the files tested on the FAT disks, 96.5% of them had between 2 to 20 fragments. This scenario of fragmented and subsequently deleted files presents a further challenge requiring a more advanced form of file carving techniques to reconstruct the files from the extracted data fragments. The reconstruction of objects from a collection of randomly mixed fragments is a common problem that arises in several areas, such as archaeology [2], [3], biology [4] and art restoration [5], [6]. In the area of fragmented file craving, research efforts are currently on-going. A proposed approach is the Bifragment gap carving(BGC) [7]. This technique searches and recovers files, fragmented into two fragments that contain identifiable headers and footers. A carving technique introducing mapping functions and discriminators was proposed

in [8], [9]. The mapping functions represent various ways a file can be reconstructed and the discriminators check the validity of the reconstructed file until the best one is obtained. An idea of using a graph theoretic approach to perform file craving was studied in [10]–[14]. In graph theoretic carving, the fragments are represented by the vertices of a graph and the edges are assigned weights which are values that indicate the likelihood that two fragments are adjacent in the original file. For example, in image files, a possible technique to evaluate the candidate weights between any two fragments is pixel matching whereby the total number of pixels matching along the edges for the two fragments are summed. Each pixel value is then compared with the corresponding pixel value in the other fragment. The closer the values, the better the match [10]. Another technique is to perform median edge detection. Each pixel is predicted from the value of the pixel above, to the left and left diagonal to it [15]. Using median edge detection, it would be possible to compute the sum of the absolute value of the difference between the predicted value in the adjoining fragment and the actual value. The carving is then based on obtaining the path of the graph with the best set of weights. We discuss the carving methods further in Section II on related work. In this paper, we propose a system design taking into consideration realistic and complex fragmentation scenarios to perform evidence reconstruction and recovery. We then present the evaluation and experimental results, demonstrating the efficiency and accuracy of our developed system. Our comparison with the existing commercial system, Adroit Photo Forensics [16], shows that considering a wider and more complex fragmentation scenarios will enable the reconstruction and recovery of even obscure evidence. This is achieveable while not compromising efficiency and accuracy, which is extremely beneficial to the digital forensic investigation process. The rest of the paper is organised as follow. In Section II, we describe and discuss the related existing work in fragmented file carving. In Section III, we propose and describe in details, the design of our carving system and its underlying algorithm. We present the experimental, evaluations and analysis results of our system, and compare it with the existing commercial system [16] in Section IV. Future work is discussed in Section V. We conclude the paper in Section VI.

2

II. R ELATED W ORK In fragmented file carving, the objective is to arrange a file back to its original structure and recover all the files from the raw data. The technique should not rely on the file system information, which may not exist (e.g. deleted fragmented file, corrupted file system). In this section, we discussed several existing techniques capable of performing fragmented file carving. Bifragment gap carving [7] was introduced as a fragmented file carving technique that assumed most fragmented files comprised of the header and footer fragments only. It exhaustively searched for all the combinations of blocks between an identified header and footer, while incrementally excluding blocks that result in unsuccessful decoding/validation of the file. A limitation of this method was that it could only support carving for files with two fragments. For files with more than two fragments, the complexity grew extremely large. In [8], the file fragments were “mapped” into a file by utilizing different mapping functions. A Mapping function generator generated new mapping functions which were tested by a discriminator. The goal was to derive a mapping function which minimizes the error rate in the discriminator. Therefore, it is important to construct a good discriminator to localize errors within the file, so that discontinuities can be determined accurately. If the discriminator fails to indicate the precise locations of the errors, all the permutations need to be generated, which could become intractable. In carving, the simplest (but tedious) approach would be to test each fragment against one another to check how likely any two fragments is a valid joint match. Joints are then assigned weights and these weights represent the likelihood that the two fragments are a correct match. Since the header can be easily identified, any edge joining the header is considered a single directional edge while all other edges are bi-directional. Therefore, if there are n fragments (excluding headers, h), there will be a total of n(n-1+h) weights. The problem can thus be converted into a graph theoretic problem where the fragments are represented by the vertices and the weights are represented by the edges. The goal is to find a file construction path which passes each vertex exactly once and has a maximum sum of edge weights, given the starting vertex. In this case, the starting vertices will correspond to the headers. Graph theoretic carving was implemented as a technique to reassemble fragmented files by constructing a k-vertex disjoint graph. Utilizing a matching metric, the reassembly was performed by finding an optimal ordering of the file blocks/sectors. The various greedy heuristic based file carving methods were described in [10]. The main drawback of the methods was that it performed a pre-computation of all the weights between two fragments which was computationally expensive. It also failed to take into consideration realistic scenarios of fragmentations to obtain the optimal path efficiently. The authors’ implementation of the greedy heuristic algorithms, which developed into a commercial recovery system [16], solved the above-mentioned problem by setting constraints on the fragment locations and sequence. However, the problem of the difficulty in detecting obscure evidence arises

as file systems do not set such constraints when fragmenting files. In [14], the authors assumed that all the fragments belonging to a file were known. This was achieved through the identification of fragments of a file, based on the groups of fragments belonging to an image of the same scenery (i.e. edge pixel difference detection) or context based modelling for document fragments [12]. The problem was also modelled in a graph theoretic form. Therefore, a file construction path was defined as one passing through all the vertices in the graph. The authors proposed an inequality-based method to extract the equations’ inequalities portions for computations. It was suitable for relative small values of number of fragments, n, and had a 100% success rate of constructing the files correctly. A tradeoff algorithm was also proposed to allow a flexible control over the algorithm complexity, while at the same time, obtain sufficiently good results for fragmented file carving. This algorithm achieved an 85% success rate of constructing the files correctly. However, the methods relied on first knowing and extracting the fragments belonging to a file, which may not be achievable in the case of encoded data, such as JPEG files. III. P ROGRESSIVE J OINT C ARVER In this section, we describe our proposed carving system, “Progressive Joint Carver” (Pro-joint carver), for carrying out the fragmented file carving process. We assume we are presented with storage devices containing raw fragments of both deleted and active files that may or may not be arranged in its proper original sequence. The goal of our work is to arrange them back to its original state in as short a time as possible through an automated system and process. The Pro-joint carver takes into consideration the realistic characteristics of the fragmentation of files during storage (i.e. based on the rationale that fragmentation causes overhead and is avoided by files systems unless necessary) and performs progressive steps of joint weights computation, comparison and fragments joining. We first describes the Pro-joint carver’s process, the underlying carving algorithm and then analyses the carver based on real data to demonstrate the correctness of the carving technique. The Pro-joint carver first extracts the header fragments from the raw data acquired from the storage device. The remaining fragments are then processed for progressive multilevel weight computation, path-based candidates sorting and reassembly as follow. A. Completed Path Elimination The first level (n-h) joint weights are computed, sorted and stored (n: total fragments excluding header fragments, h: total header fragments). A comparison among the best-weight candidate fragments (in each path) is made, to choose and confirm the joint in the path with the best weight. The sorted joint list for the previous fragment is then removed to conserve process memory. The confirmed fragment is also removed from the data set and all the sorted lists. If the confirmed joint fragment is a footer fragment, the entire path is removed from the data set and processing graph.

3

The process is repeated for the non-completed paths. In the event that a confirmed best-weight joint fragment is not the footer fragment, the carver removes the sorted joint list for the previous fragment. The confirmed fragment is removed from the data set and all the sorted lists. The joint weights of the confirmed fragment to the remaining (n-h-𝑓𝑐 ) fragments (𝑓𝑐 : confirmed fragments so far) are computed. The joint list for the confirmed fragment is sorted and the carving process is repeated. C. Duplicate Best-weight Candidate in Different Paths Here, we consider the scenario where a candidate fragment appears as the best-weight candidate for more than one path. The joint with the better/best weight will be used to select its candidate fragment as the confirmed fragment, while the other “current level” fragments will remove and replace their best-weight candidate fragment with the next best match. D. Similar Best-weight Candidates in A Single Path In the event that more than one best-weight candidate appears in a single path, the carver forms split paths from the current fragment. Therefore, erroneous paths occur since there can only be one valid path. We perform an automated error detection through the final paths’ weight comparison and the file size check (i.e. premature overlimit). The incorrect path/s are subsequently eliminated from the graph and the confirmed fragments in the completed correct path are removed from the data set. The completed fragments are also removed from all the sorted joint lists. E. Similar Best-weight Candidates Within A Single Path and Across Multiple Paths In this section, we consider the case where more than one best-weight candidate appears in a single path, as well as, across multiple paths. We assign two modifiers (𝛼 and 𝛽) to compute the final joint weight between two fragments. The 𝛼 modifier (where 𝛼 < 1, is adjusted based on the occurrences of single path best-weight candidates) considers the case where more than one best-weight candidate appears within the same path. The 𝛽 modifier (where 𝛽 < 1, is adjusted based on the number of different paths where a particular best-weight candidate appears in) considers the case where a best-weight candidate happens to be the best match across multiple paths. The final weight of a joint is computed as (𝛼 * 𝛽 * 𝑤𝑗 ), where 𝑤𝑗 is the original joint weight. F. Sub-Fragment Joint Verification Algorithm In this section, we describe the underlying carving algorithm which we designed to achieve a fast and reliable detection of the fragment joints to support the reconstruction of evidentiary files from their fragments, by bypassing redundant decoding processes. Only the final sub-fragment (including residue bits) of each fragment is used in the joint verfication and classification, resulting in the conservation of the time and

computational resources in the carving process. The verification algorithm captures the last fixed l components and variable k bits of residue data in a fragment. The value l is used to adjust the joint weight computation accuracy. For example, in the application of N-gram models for sequence matching, a larger N corresponds to a larger l. The fragment is then classified based on its last sub-fragment. Fragments belonging to the same class need only be compared once to other candidate fragments, thereby, eliminating redundant joint weight computations. For the joint weight computation, only the first l components of the candidate fragments are used. The best-weight candidate fragment is then selected and the process is repeated for the other fragments. IV. E XPERIMENTS AND E VALUATIONS 100 Remaining files after completion of files with specific fragments (%)

B. Used Fragments Elimination

FAT NTFS UFS 80

60

40

20

0 0

10

20 30 40 Average processing time in carving steps - 1

50

60

Fig. 1: Carving Progress of FAT, NTFS and UFS Files To evaluate the Pro-joint carver, we first looked into a study [1] conducted on over 300 hard drives acquired from the secondary market. The study indicated that most files contain a few number of fragments (e.g. 96.5% of the files in the most popular FAT file systems had between 2 to 20 fragments). Based on the data presented in the study, we evaluated the performance of the Pro-joint carver and plotted the performance graph (Figure 1) based on the average percentage of uncompleted carved files as the time progresses for different file systems. The time unit is in terms of average carving steps. We observed that a high percentage of the files (i.e. ≈90% are reconstructed in 5, 22 and 3 carving steps, for the FAT, NTFS and UFS file systems, respectively) are completely carved at the initial stages of the carving process. As observed from the graph, the fragments are reassembled at a fast rate in the progressive manner by the Pro-joint carver, by taking into consideration realistic fragmentation scenarios. After which, these fragments are removed resulting in significant conservations in the storage and computational requirements, as well as incurred overhead and carving time. Compared to the existing graph theoretic carving methods, where all the weights are computed at the initial phase, resulting in the computational complexity of (n+h)(n+h-1)/2 and sorting complexity of O(𝑛2 log n), the Pro-joint carver divides the computations into multiple levels, thereby eliminating redundant computations and sortings, systematically. The

4

storage in Pro-joint carver is also reduced as only the sorted lists of the weights of the remaining joints to the last confirmed fragment in the construction paths, need to be stored. Therefore, it amounts to significant memory conservation. The storage requirement also decreases dynamically as the paths are completed as well as when the fragments are confirmed. Weight computations of the other fragments to the header fragments are not required. The number of computations of the joint weight also decreases as the paths are completed and the fragments are confirmed. In addition, weight computations of the footer fragments to the other remaining fragments are also not required, resulting in a further reduction in terms of computational processing. Fig. 4: Pro-joint Carver’s Recovery of Birthday Party Image

V. F UTURE W ORK We plan to carry out more extensive experiments covering a wide range of fragment numbers, to investigate and enhance the performance of our carver. We also plan to enhance the features of our carver by implementing additional features and support for other file formats. Fig. 2: Adroit’s Recovery of Crime Scene Image

VI. C ONCLUSIONS

Fig. 3: Pro-joint Carver’s Recovery of Crime Scene Image

Fragmented file carving is very important in the area of Digital Forensics to allow the law enforcement investigators to acquire evidence even in the event that they have been maliciously deleted by criminals. Advanced techniques to achieve efficient and accurate reconstruction and recovery of deleted evidence are necessary so as to prevent the loss of obscure evidence due to complex fragmentation scenarios (e.g. spread across different locations on large storage devices, outof-sequence fragments). In our work, we proposed and designed a carving system to speed up the process of carving while not compromising on the search regions and fragmentation scenarios. Based on the study on the fragmentation of data on existing hard drives, we evaluated the performance of our carver and show that a very significant and high percentage of files is completedly carved within a few carving steps. We also implemented our carving system and compared our automated evidence reconstruction and recovery with that of the existing commercial system, Adroit Photo Forensics [16], capable of handling deleted and fragmented files. In the proof-of-concept experiments, we observed that the existing system took a longer time (2 mins 49 secs for the Crime Scene Image and 6 mins 23 secs for the Birthday Party Image) to process the carving and were not able to successfully recover the image files in the above-mentioned scenarios, while the Pro-joint carver successfully recovers the evidence accurately and consistently, within a shorter time of ≈1.5 mins for each file.

We developed the Pro-joint carver to conduct proof-ofconcept experiments on reconstructing and recovering evidentiary JPEG image files from the acquired raw data, and compared the results with the commercial recovery system, Adroit Photo Forensics /citeadroit, utilising the greedy heuristic algorithms. In the experiments, we aim to reconstruct and recover two fragmented photos (i.e. one of a mockedup crime scene and one of a birthday party celebration) which were deleted. The crime scene image composed of 11 fragments, while the birthday party image composed of 128 fragments. Figures 2 to 4 show the results of the image evidence reconstruction and recovery by both systems. The time taken to recover each file was ≈1.5 mins. Adroit, on the other hand, only partially recovered the crime scene image (in 2 mins 49 secs), while no trace of the birthday party image could be found (after process completion taking 6 mins 23 secs).

R EFERENCES [1] Garfinkel S. ”Carving contiguous and fragmented files with fast object validation”. In Proceedings of the 2007 digital forensics research workshop, DFRWS, Pittsburgh, PA, August 2007.

5

[2] Sablatnig R. and Menard C. ”On finding archaeological fragment assemblies using a bottom-up design”. In Proc. of the 21st Workshop of the Austrain Association for Pattern Recognition Hallstatt, Austria, Oldenburg, Wien, Muenchen, pages 203–207, 1997. [3] Kampel M., Sablatnig R. and Costa E. ”Classification of archaeological fragments using profile primitives”. In Computer Vision, Computer Graphics and Photogrammetry - a Common Viewpoint, Proceedings of the 25th Workshop of the Austrian Association for Pattern Recognition (OAGM), pages 151–158, 2001. [4] Stemmer W.P. ”DNA shuffling by random fragmentation and reassembly: in vitro recombination for molecular evolution”. In Proc Natl Acad Sci U S A, October 25, 1994. [5] Leitao H.C.G. and Stolfi J. ”A multiscale method for the reassembly of two-dimensional fragmented objects”. In IEEE Transections on Pattern Analysis and Machine Intelligence, vol. 24, September 2002. [6] Leitao H.C.G. and Stolfi J. ”Automatic reassembly of irregular fragments”. In Univ. of Campinas, Tech. Rep. IC-98-06, 1998. [7] Pal A., Sencar H.T. and Memon N. ”Detecting file fragmentation point using sequential hypothesis testing”. In Proceedings of the Eighth Annual DFRWS Conference. Digital Investigation. Volume 5, Supplement 1, pages S2–S13, September 2008. [8] Cohen M.I. ”Advanced jpeg carving”. In Proceedings of the 1st international conference on Forensic applications and techniques in telecommunications, information, and multimedia and workshop. Article No.16, 2008. [9] Cohen M.I. ”Advanced carving techniques”. In Digital Investigation, 4(Supplement 1):2-12, September 2007. [10] Memon N. and Pal A. ”Automated reassembly of file fragmented images using greedy algorithms”. In IEEE Transactions on Image processing, pages 385–93, February 2006. [11] Pal A., Shanmugasundaram K. and Memon N. ”Automated reassembly of fragmented images”. Presented at ICASSP, 2003. [12] Shanmugasundaram K. and Memon N. ”Automatic reassembly of document fragments via context based statistical models”. In Proceedings of the 19th Annual Computer Security Applications Conference, page 152, 2003. [13] Shanmugasundaram K. and Memon N. ”Automatic reassembly of document fragments via data compression. presented at the 2nd digital forensics research workshop, syracuse”, July 2002. [14] Ying H-M. and Thing V.L.L. ”A novel inequality-based fragmented file carving technique”. In International ICST Conference on Forensic Applications and Techniques in Telecommunications, Information and Multimedia, e-Forensics, November 2010. [15] Martucci S.A. ”Reversible compression of hdtv images using median adaptive prediction and arithmetic coding”. In IEEE International Symposium on Circuits and Systems, pages 1310–1313, 1990. [16] Adroit photo forensics. http://digital-assembly.com/products/ adroit-photo-forensics/.

Design of a Digital Forensics Evidence Reconstruction ...

THE increasing reliance on digital storage devices such as hard disks and solid state disks for storing important private data and highly confidential information ...

323KB Sizes 0 Downloads 187 Views

Recommend Documents

Digital Forensics
ناریا زمر همجوا ییوجشواد هخاش. Outline. Introduction. Digital Forensics. Forensics & Watermarking. Applications. Nonintrusive Forensics. Blind Identification of Source Camera Model. Conclusion. 13:38. 2. The 1st Workshop on I

pdf-1833\virtualization-and-forensics-a-digital-forensic-investigators ...
Try one of the apps below to open or edit this item. pdf-1833\virtualization-and-forensics-a-digital-forensic-investigators-guide-to-virtual-environments.pdf.

digital-forensics-for-network-internet-and-cloud-computing-a-forensic ...
... Infosecurity. Page 3 of 339. digital-forensics-for-network-internet-and-cloud-comp ... e-for-moving-targets-and-data.9781597495370.52476.pdf.

Why the Evidence of Evolution Reveals a Universe without Design
You can discover a lot of books that we share right here in this web site. ... time to start appreciating this publication The Blind Watchmaker: Why The Evidence Of ...

MODULATION FORENSICS FOR WIRELESS DIGITAL ...
... College Park, MD 20742 USA. Email: {wylin, kjrliu}@eng.umd.edu ... achieves almost perfect identification of the space-time coding, and high accuracy rate of ...

Effective Digital Forensics Research is Investigator-Centric.pdf ...
... Marc Liberatore Clay Shields†. Dept. of Computer Science, University of Amherst, MA. †Dept. of Computer Science, Georgetown University, Washington, D.C..

Validation for Digital Forensics
any analysis processes and tools applied to the data to aid in the analysis of the ..... to be performed to identify the formal implications of information visualization ... Civilian Bombing Casualties,” Project on Defense Alternatives. Briefing. R