Tuesday, October 24, 2017

file carving algorithms

  • 4. STRING MATCHING ALGORITHMS
Since   there   is   no   guarantee   that   file-system metadata exists to provide the location of each file within a file system, searching every byte of a raw disk    image    is    unavoidable   for    file    carving applications  to  identify  the  locations  of  structured files.
Therefore,  employing  a  fast  string  matching algorithm   is   indispensable   for   minimizing   the overhead    of    file    carving    applications.The  objective of  string  matching  algorithms  is  to  find one   or   more   occurrences   of   pattern   in   a   text through  the  sliding  window  mechanism.

4.1 The Brute Force Algorithm
The brute force algorithm checks for the pattern by shifting  the  window  by  exactly  one  position  with the  time  complexity O(m×n).

4.2 The Boyer-Moore Algorithm
The  Boyer-Moore  algorithm  (Boyer  &  Moore, 1977)    and    the    Knuth-Morris-Pratt    algorithm (Knuth  et  al.,  1977)  are  among  the  most  widely
used single  pattern matching algorithms,  in which each   pattern   is   searched   within a   given   text separately.     The     Boyer-Moore     algorithm     is
considered  as  the  most  efficient  string  searching algorithm  in  both  theory  and  practice,and  it  has become the standard for practical string searching.

4.3 The Knuth-Morris-Pratt Algorithm
Knuth-Morris-Pratt    algorithm    with    the    time complexity  proportional  to  the  sum  of  the  lengths of pattern and text, O(m+n), which is independent of  the  alphabet  size.  The  algorithm  performs  the string  matching  from  left  to  right,  and it needs a preprocessing  phase  to  construct  a  partial-match table  with  the  time  complexity  O(m).

4.4 The Karp-Rabin Algorithm
Since  hashing  is  able  to  provide  a  simple  method to    avoid a    quadratic    number    of    character comparisons,  Karp  and  Rabin  (Karp  &  Rabin, 1987)   propose   an   efficient   randomized   pattern matching algorithm that only checks if the window of  text  similar  to  the  pattern  through  the  hashing  function.

4.5 The Horspool Algorithm
The  Horspool  algorithm (Horspool,  1980)  is  a simplified  version  of  the  Boyer-Moore  algorithm (Boyer  &  Moore,  1977),  which only  utilizes  the precomputed  “bad-character  shift”  function  for shifts  in  the  window.

4.6 The Quick Search Algorithm
Similar   to   the   Horspool   algorithm   (Horspool, 1980), the Quick Search algorithm (Sunday, 1990) is  also  a  simplified  version  of  the  Boyer-Moore algorithm  (Boyer  &  Moore,  1977),  which only utilizes  the  precomputed  “bad-character  shift” function  for  shifts  in  the  window.

4.7 The Shift-Or Algorithm
The   main   idea   of   the   Shift-Or   algorithm   (R. Baeza-Yates  &  Gonnet,  1992)  is  to  represent  the search  state  as  a  number, and  each  search  attempt performs a small number of arithmetic and logical operations

4.8 The Smith Algorithm
It  utilizes the  precomputed  “bad-character shift” function  for  shifts  in  the  window from the Horspool algorithm (Horspool, 1980) and the  Quick  Search  algorithm  (Sunday,  1990).

4.9 The Raita Algorithm
Raita   (Raita,   1992)   proposes   a   new implementation that makes use of the dependencies  between  successive  symbols.

4.10 The Berry-Ravindran Algorithm
The  Berry-Ravindran  algorithm  is  a  composite  of  the  Quick Search   algorithm   (Sunday,   1990)   and   another variant  of  the  Boyer-Moore  algorithm  (Boyer  & Moore,   197
7),   the   Zhu-Takaoka   algorithms


To  provide comparisons between  multiple  string matching  algorithms  described from the perspective of forensic analysis

experimental  testbed  implemented with VMware Workstation  and  Ubuntu  12.04.3  based  on  the AMD64  architecture.
The  virtual  machine  utilizes a  single  CPU  core  with  1GB  of  memory
two  test  images  for  Scalpel 2.0 to extract various file formats.
The first image a  raw partition  image  of  a  65  MB  FAT32  file  system,
the second image  a raw partition image of a 129.4 MB  EXT2  file  system.
the  file  formats within  the  two  images  include  doc,  gif,  jpg,  mov, pdf,  wav,  and  wmv
12  known  header  and  footer patterns  in  the  configuration  file  ”scalpel.conf"


http://commons.erau.edu/cgi/viewcontent.cgi?article=1318&context=adfsl




  • B.Multiple Pattern String Matching Algorithms
Multiple  pattern  matching  is  an  important  problem  in  text  processing  and  is  commonly  used  to  locate  all  the  positions  of  an  input  string  (the  so  called  “text”)  where 
one or more keywords (the so called “patterns”) from a finite   set   of   keywords   occur.   Multi-pattern   string   matching  arises  in  a  number  of  applications  including  network  intrusion detection,  digital  forensics,  business  analytics, and natural language processing

1) Aho-Corasick   String   Matching   Algorithm:   Aho-Corasick  algorithm  is  one  of  the  earliest  multi-pattern  exact matching algorithms.
2) Commentz  –  Walter  String  Matching  Algorithm:    Commentz-Walter    algorithm    combines    the    Boyer-Moore  technique  with  the  Aho-Corasick  algorithm.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.227.7016&rep=rep1&type=pdf


  • Multi-pattern string matching is widely used in applications such as network
intrusion detection, digital forensics and full text search.

We focus on Scalpel, a popular open source file recovery tool which performs file
carving using the Boyer-Moore string search algorithm to locate headers and footers in a

we propose fast in-place carving techniques of a popular digital forensics tool Scalpel as well as our
GPU adaptations of the Aho-Corasick and multipattern Boyer-Moore string matching algorithms for the two cases GPU-to-GPU and host-to-host.

FAST IN-PLACE FILE CARVING FOR DIGITAL FORENSICS
We show that the time required for file carving
may be reduced significantly by employing multi-pattern search algorithms such as the
multipattern Boyer-Moore and Aho-Corasick algorithms as well as asynchronous disk
reads and multithreading as typically supported on multicore commodity PC


3. We are exploring multi-pattern string matching algorithms on Graphics Processing
Units (GPUs) that comprise a large number of processors. The target applications are
digital forensics and intrusion detection. We have analyzed the performance of the
popular file-carving software Scalpel 1.6 and determined that this software spend
almost all of its time reading from disk and searching for headers and footers. The
time spent on the latter activity may be drastically reduced (by a factor of 17 when we
have 48 rules) by replacing Scalpel’s current search algorithm (Boyer Moore) by the
Aho-Corasick algorithm
http://etd.fcla.edu/UF/UFE0041979/zha_x.pdf

  • File carving is a forensics technique that recovers files based merely on file structure and content and without any matching file sys-
tem  meta-data.
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4806206&tag=1


  • Block-Based Carving Any carving method (algorithm) that analyzes the input on block-by-block basis to determine if a block is part of a possible output file. This method assumes that each block can only be part of a single file (or embedded file).

Statistical Carving Any carving method (algorithm) that analyzes the input on characteristic or statistic for example, entropy) to determine if the input is part of a possible output file.

Header/Footer Carving A method for carving files out of raw data using a distinct header (start of file marker) and footer (end of file marker).

Header/Maximum (file) size Carving A method for carving files out of raw data using a distinct header (start of file marker) and a maximum (file) size. This approach works because many file formats (e.g. JPEG, MP3) do not care if additional junk is appended to the end of a valid file.

Header/Embedded Length Carving A method for carving files out of raw data using a distinct header and a file length (size) which is embedded in the file format File structure based Carving A method for carving files out of raw data using a certain level of knowledge of the internal structure of file types. Garfinkel called this approach "Semantic Carving" in his DFRWS2006 carving challenge submission, while Metz and Mora called the approach "Deep Carving." Semantic Carving A method for carving files based on a linguistic analysis of the file's content. For example, a semantic carver might conclude that six blocks of french in the middle of a long HTML file written in English is a fragment left from a previous allocated file, and not from the English-language HTML file.

Carving with Validation A method for carving files out of raw data where the carved files are validated using a file type specific validator.

Fragment Recovery Carving A carving method in which two or more fragments are reassembled to form the original file or object. Garfinkel previously called this approach "Split Carving."

Repackaging Carving A carving method that modifies the extracted data by adding new headers, footers, or other information so that it can be viewed with standard utilities. For example, Garfinkel's ZIP Carver looks for individual components of a ZIP file and repackages them with a new Central Directory so that they can be opened with a standard unzip utility.

http://www.cse.scu.edu/~tschwarz/COEN252_13/Labs/lab2.html

  • Scalpel,  a  popular  open  source  file  recovery  tool,performs  file  carving  using  the  Boyer-Moore  string  search  algorithm  to  locate  headers  and  footers  in  a  disk  image.
We show  that  the  time  required  for  file  carving  may  be  reduced significantly by employing multi-pattern search algorithms such as the multipattern Boyer-Moore and Aho-Corasick algorithms as
well as asynchronous disk reads and multithreading as typically supported  on  multicore  commodity  PCs.


  • BoyerMoore.java
https://algs4.cs.princeton.edu/53substring/BoyerMoore.java.html


  • File Carving, or sometimes simply Carving, is the practice of searching an input for files or other kinds of objects based on content, rather than on metadata
http://www.forensicswiki.org/wiki/File_Carving#File_Carving_challenges_and_test_images

  • Tools:Data Recovery
http://www.forensicswiki.org/wiki/Tools:Data_Recovery


  • File carving is a well known computer forensics term used to describe the identification and extraction of file types from unallocated clusters using file signatures.
A file signature, also commonly referred to as a magic number, is a constant numerical or text value used to identify a file format.
The object of carving is to identify and extract (carve) the file based on this signature information alone.

http://www.forensicexplorer.com/data-carve.php

  • Data carving, or file carving is a process of reading files without reference to an file system.
The technique can be applied to any type if disk that stores data on sector boundaries
It is based on the fact that most files start with a recognisable data signature

https://www.cnwrecovery.com/html/data_carving.html

No comments:

Post a Comment