- 4. STRING MATCHING ALGORITHMS
Therefore, employing a fast string matching algorithm is indispensable for minimizing the overhead of file carving applications.The objective of string matching algorithms is to find one or more occurrences of pattern in a text through the sliding window mechanism.
4.1 The Brute Force Algorithm
The brute force algorithm checks for the pattern by shifting the window by exactly one position with the time complexity O(m×n).
4.2 The Boyer-Moore Algorithm
The Boyer-Moore algorithm (Boyer & Moore, 1977) and the Knuth-Morris-Pratt algorithm (Knuth et al., 1977) are among the most widely
used single pattern matching algorithms, in which each pattern is searched within a given text separately. The Boyer-Moore algorithm is
considered as the most efficient string searching algorithm in both theory and practice,and it has become the standard for practical string searching.
4.3 The Knuth-Morris-Pratt Algorithm
Knuth-Morris-Pratt algorithm with the time complexity proportional to the sum of the lengths of pattern and text, O(m+n), which is independent of the alphabet size. The algorithm performs the string matching from left to right, and it needs a preprocessing phase to construct a partial-match table with the time complexity O(m).
4.4 The Karp-Rabin Algorithm
Since hashing is able to provide a simple method to avoid a quadratic number of character comparisons, Karp and Rabin (Karp & Rabin, 1987) propose an efficient randomized pattern matching algorithm that only checks if the window of text similar to the pattern through the hashing function.
4.5 The Horspool Algorithm
The Horspool algorithm (Horspool, 1980) is a simplified version of the Boyer-Moore algorithm (Boyer & Moore, 1977), which only utilizes the precomputed “bad-character shift” function for shifts in the window.
4.6 The Quick Search Algorithm
Similar to the Horspool algorithm (Horspool, 1980), the Quick Search algorithm (Sunday, 1990) is also a simplified version of the Boyer-Moore algorithm (Boyer & Moore, 1977), which only utilizes the precomputed “bad-character shift” function for shifts in the window.
4.7 The Shift-Or Algorithm
The main idea of the Shift-Or algorithm (R. Baeza-Yates & Gonnet, 1992) is to represent the search state as a number, and each search attempt performs a small number of arithmetic and logical operations
4.8 The Smith Algorithm
It utilizes the precomputed “bad-character shift” function for shifts in the window from the Horspool algorithm (Horspool, 1980) and the Quick Search algorithm (Sunday, 1990).
4.9 The Raita Algorithm
Raita (Raita, 1992) proposes a new implementation that makes use of the dependencies between successive symbols.
4.10 The Berry-Ravindran Algorithm
The Berry-Ravindran algorithm is a composite of the Quick Search algorithm (Sunday, 1990) and another variant of the Boyer-Moore algorithm (Boyer & Moore, 197
7), the Zhu-Takaoka algorithms
To provide comparisons between multiple string matching algorithms described from the perspective of forensic analysis
experimental testbed implemented with VMware Workstation and Ubuntu 12.04.3 based on the AMD64 architecture.
The virtual machine utilizes a single CPU core with 1GB of memory
two test images for Scalpel 2.0 to extract various file formats.
The first image a raw partition image of a 65 MB FAT32 file system,
the second image a raw partition image of a 129.4 MB EXT2 file system.
the file formats within the two images include doc, gif, jpg, mov, pdf, wav, and wmv
12 known header and footer patterns in the configuration file ”scalpel.conf"
http://commons.erau.edu/cgi/viewcontent.cgi?article=1318&context=adfsl
- B.Multiple Pattern String Matching Algorithms
one or more keywords (the so called “patterns”) from a finite set of keywords occur. Multi-pattern string matching arises in a number of applications including network intrusion detection, digital forensics, business analytics, and natural language processing
1) Aho-Corasick String Matching Algorithm: Aho-Corasick algorithm is one of the earliest multi-pattern exact matching algorithms.
2) Commentz – Walter String Matching Algorithm: Commentz-Walter algorithm combines the Boyer-Moore technique with the Aho-Corasick algorithm.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.227.7016&rep=rep1&type=pdf
- Multi-pattern string matching is widely used in applications such as network
We focus on Scalpel, a popular open source file recovery tool which performs file
carving using the Boyer-Moore string search algorithm to locate headers and footers in a
we propose fast in-place carving techniques of a popular digital forensics tool Scalpel as well as our
GPU adaptations of the Aho-Corasick and multipattern Boyer-Moore string matching algorithms for the two cases GPU-to-GPU and host-to-host.
FAST IN-PLACE FILE CARVING FOR DIGITAL FORENSICS
We show that the time required for file carving
may be reduced significantly by employing multi-pattern search algorithms such as the
multipattern Boyer-Moore and Aho-Corasick algorithms as well as asynchronous disk
reads and multithreading as typically supported on multicore commodity PC
3. We are exploring multi-pattern string matching algorithms on Graphics Processing
Units (GPUs) that comprise a large number of processors. The target applications are
digital forensics and intrusion detection. We have analyzed the performance of the
popular file-carving software Scalpel 1.6 and determined that this software spend
almost all of its time reading from disk and searching for headers and footers. The
time spent on the latter activity may be drastically reduced (by a factor of 17 when we
have 48 rules) by replacing Scalpel’s current search algorithm (Boyer Moore) by the
Aho-Corasick algorithm
http://etd.fcla.edu/UF/UFE0041979/zha_x.pdf
- File carving is a forensics technique that recovers files based merely on file structure and content and without any matching file sys-
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4806206&tag=1
- Block-Based Carving Any carving method (algorithm) that analyzes the input on block-by-block basis to determine if a block is part of a possible output file. This method assumes that each block can only be part of a single file (or embedded file).
Statistical Carving Any carving method (algorithm) that analyzes the input on characteristic or statistic for example, entropy) to determine if the input is part of a possible output file.
Header/Footer Carving A method for carving files out of raw data using a distinct header (start of file marker) and footer (end of file marker).
Header/Maximum (file) size Carving A method for carving files out of raw data using a distinct header (start of file marker) and a maximum (file) size. This approach works because many file formats (e.g. JPEG, MP3) do not care if additional junk is appended to the end of a valid file.
Header/Embedded Length Carving A method for carving files out of raw data using a distinct header and a file length (size) which is embedded in the file format File structure based Carving A method for carving files out of raw data using a certain level of knowledge of the internal structure of file types. Garfinkel called this approach "Semantic Carving" in his DFRWS2006 carving challenge submission, while Metz and Mora called the approach "Deep Carving." Semantic Carving A method for carving files based on a linguistic analysis of the file's content. For example, a semantic carver might conclude that six blocks of french in the middle of a long HTML file written in English is a fragment left from a previous allocated file, and not from the English-language HTML file.
Carving with Validation A method for carving files out of raw data where the carved files are validated using a file type specific validator.
Fragment Recovery Carving A carving method in which two or more fragments are reassembled to form the original file or object. Garfinkel previously called this approach "Split Carving."
Repackaging Carving A carving method that modifies the extracted data by adding new headers, footers, or other information so that it can be viewed with standard utilities. For example, Garfinkel's ZIP Carver looks for individual components of a ZIP file and repackages them with a new Central Directory so that they can be opened with a standard unzip utility.
http://www.cse.scu.edu/~tschwarz/COEN252_13/Labs/lab2.html
- Scalpel, a popular open source file recovery tool,performs file carving using the Boyer-Moore string search algorithm to locate headers and footers in a disk image.
well as asynchronous disk reads and multithreading as typically supported on multicore commodity PCs.
- BoyerMoore.java
- File Carving, or sometimes simply Carving, is the practice of searching an input for files or other kinds of objects based on content, rather than on metadata
- Tools:Data Recovery
- File carving is a well known computer forensics term used to describe the identification and extraction of file types from unallocated clusters using file signatures.
The object of carving is to identify and extract (carve) the file based on this signature information alone.
http://www.forensicexplorer.com/data-carve.php
- Data carving, or file carving is a process of reading files without reference to an file system.
It is based on the fact that most files start with a recognisable data signature
https://www.cnwrecovery.com/html/data_carving.html
No comments:
Post a Comment