Monday, November 9, 2009

Why Fuzzy Hashing is Really Cool

For years, computer forensic investigators have put a great deal of stock in the effectiveness of MD5 hashing. Now to quantify that statement, I mean specifically using MD5 hashes to identify known malicious files. The key word in that sentence is known, but let's take that one step further to add the word “unmodified” known files. One minor change to a file, and the MD5 hash is now completely different, rendering the investigators search totally ineffective. So, what's the answer? Easy, fuzzy hashing.

Hash comparisons are either a yes or a no – either the hash matches, or it doesn't. But, that does not mean that the files are not the same, it just means they are not exactly the same. I am going to use a simple example, that will illustrate exactly what I am talking about.

The photograph of Oklahoma State University wide receiver Dez Bryant below was taken from, “” on November 09, 2009.

Using MD5Deep, I took generated an MD5 hash for this picture:

b2cedc90072bacc43fdcc533ad4f24ad /home/cepogue/Pictures/DezBryant.jpg

Now, if you were an investigator, and you were going to search for that image of Dez based on the MD5 hash, you would only find it if the image were totally and completely identical to this original.

To show how easy it is to modify an image like this, I used Ghex to open the image and scrolled to the bottom of the content.

Note at offset 5879 (the last line), there are only four characters, which on the right translate to a blank space, a question mark, and two periods. Using Ghex, I am simply going to replace the blank space with a period.

Look at offset 5879 again in the figure above. I replaced the blank space with the period, changing that last line from "20 3F FF D9" to "2E 3F FF 2E". A very minor change. As you can see from the modified image of Dez below, there is no visible change to the image.

Again, using MD5deep, I calculated the MD5 hash of the image, and it is totally different from the first image.

Here is the unmodified image hash one more time:
b2cedc90072bacc43fdcc533ad4f24ad /home/cepogue/Pictures/DezBryant.jpg

Here is the modified image hash:
df3e3d942610781f9b9d0b41683c46db /home/cepogue/Pictures/DezBryant2.jpg

The hashes are not even close. So, if an investigator was performing a search for this image based on the MD5 hash, he would fail to find it.

So, if you are an investigator, you may be thinking, “Aw what?! So ALL of the hash comparisons I have been doing could have failed while the evidence was still present?”

The answer to question is, “Yes...if the images were modified in any way...yes they did.” But, there is hope, and that hope is called fuzzy hashing.

Since the one to one comparison of hash sets is obviously antiquated and inadequate, Jesse Kornblum of Mantech thought up a fantastic solution called fuzzy hashing. Using a tool called SSDEEP, you can generate hash values that can then be compared to other files producing a percentage in which the file matches other files!

Using SSDEEP, I generate an output file from the first image of Dez that looks like this:

ssdeep -b DezBryant.jpg

I simply redirected the output to a file named dez.hash.

Then, I use that file to compare to the second image of Dez:

root@Linux-Forensic1:/home/cepogue/Pictures# ssdeep -bm dez.hash DezBryant2.jpg
DezBryant2.jpg matches DezBryant.jpg (99)

As you can see from the output, these two images are 99% similar.

Using fuzzy hashing can efficiently and effectively help investigators to identify files that contain a high percentage of similarities. While the file may not be 100% exactly the same, as proven by my example, that does not necessarily mean that they are not the same image. This same theory can be used with really any type of file. An investigator can then take the files with the highest percentage of similarities and manually review those individual files.

SSDEEP is a free utility and can be downloaded from

Tuesday, November 3, 2009

Mount_EWF and Ubuntu 9.04

***Props to Steven Venter of Trustwave UK for putting this together. I used this today, with some minor modifications.***

So, I was faced with the need to mount a EWF image on my Ubuntu box so that I could use some of the TSK utilities on the image. Below, is how to get a tool called, "mount_ewf" working with Ubuntu 9.04.

So here's a quick update on getting EWF mounting capabilities installed on a new Ubuntu install [in this case the 32-bit version of Jaunty Jackalope Ubuntu 9.04]

The libewf software is now available at:

The files I downloaded were:
steve@jj:~/software/EWF$ ls -1
libewf-beta-20090506.tar.gz *** I changed this too...I did NOT grab this file***

== Install the required build dependencies
-- the
required Debian packages in Ubuntu are: zlib1g-dev libssl-dev uuid-dev
$ sudo apt-get install zlib1g-dev libssl-dev uuid-dev

== Create Debian (.deb) packages to install
Since the downloads are now standard source code format, I tried to create Debian (.deb) packages using the guidance here:

***This took me awhile to get working properly, as the "how to" is kind of vague.

First off, let's install the necessary tools:
# apt-get install autotools-dev fakeroot dh-make build-essential

Next, take the tarball you downloaded, in this case libewf-20080501.tar.gz
uncompress the tarball
tar -xzvf
cd into the newly created directory

Now, you are going to use the dh_make utility to make the debian control files
dh_make -f /path/to/tarball <-- this is important. You have got to tell tool the location of the original tarball...presumably, just down one directory. In my case, I dropped my tarball into /usr/local/bin (which is where I drop all of my install files).

Then select "S" for single binary.

Then run the following: (this has to be done as root)
# dpkg-buildpackage -rfakeroot

Step 1: Install required dependency packages:
$ sudo apt-get install autotools-dev fakeroot dh-make build-essential

Step 2: Copy the source code tarball to /tmp and extract the contents there steve@jj:~/software/EWF$ cp libewf-beta-20090506.tar.gz /tmp/
steve@jj:~/software/EWF$ cd /tmp/
steve@jj:/tmp$ tar -zxf libewf-beta-20090506.tar.gz
steve@jj:/tmp$ cd libewf-20090506/

Step 3a: No need to make the debian control files, since they are already there [in the debian/ sub-folder]

Step 3b: Build the debian package:
steve@jj:/tmp/libewf-20090506$ sudo dpkg-buildpackage -rfakeroot
** this ended with the following output:
signfile libewf_20090506-1.dsc
gpg: WARNING: unsafe ownership on configuration file `/home/steve/.gnupg/gpg.conf'
gpg: skipped "Joachim Metz ": secret key not available
gpg: [stdin]: clearsign failed: secret key not available

dpkg-genchanges >../libewf_20090506-1_amd64.changes
dpkg-genchanges: including full source code in upload
dpkg-buildpackage: full upload (original source is included)
dpkg-buildpackage: warning: Failed to sign .dsc and .changes file

Step 3c: List the newly created files:
steve@jj:/tmp/libewf-20090506$ cd ..
steve@jj:/tmp$ ls -ld libewf*
drwxr-xr-x 15 steve steve 4096 2009-05-08 18:41 libewf-20090506
-rw-r--r-- 1 root root 2262 2009-05-08 18:42 libewf_20090506-1_amd64.changes
-rw-r--r-- 1 root root 177340 2009-05-08 18:42 libewf_20090506-1_amd64.deb
-rw-r--r-- 1 root root 511 2009-05-08 18:40 libewf_20090506-1.diff.gz
-rw-r--r-- 1 root root 826 2009-05-08 18:40 libewf_20090506-1.dsc
-rw-r--r-- 1 root root 810174 2009-05-08 18:40 libewf_20090506.orig.tar.gz
-rw-r--r-- 1 steve steve 809523 2009-05-08 18:22 libewf-beta-20090506.tar.gz
-rw-r--r-- 1 root root 222562 2009-05-08 18:42 libewf-dev_20090506-1_amd64.deb
-rw-r--r-- 1 root root 195290 2009-05-08 18:42 libewf-tools_20090506-1_amd64.deb

== Install the newly created .deb packages:
steve@jj:/tmp$ sudo dpkg -i libewf*.deb
Selecting previously deselected package libewf.
(Reading database ... 109479 files and directories currently installed.)
Unpacking libewf (from libewf_20090506-1_amd64.deb) ...
Selecting previously deselected package libewf-dev.
Unpacking libewf-dev (from libewf-dev_20090506-1_amd64.deb) ...
Selecting previously deselected package libewf-tools.
Unpacking libewf-tools (from libewf-tools_20090506-1_amd64.deb) ...
Setting up libewf (20090506-1) ...

Setting up libewf-dev (20090506-1) ...
Setting up libewf-tools (20090506-1) ...
Processing triggers for man-db ...
Processing triggers for libc6 ...
ldconfig deferred processing now taking place

== To use the mount_ewf script, need to install python-fuse:
steve@jj:/tmp$ sudo apt-get install python-fuse

== Create a mount.ewf executable in the /sbin directory and grant it "execute" permissions:
steve@jj:/tmp$ cd
steve@jj:~$ cd software/EWF/
steve@jj:~/software/EWF$ cp /sbin/mount.ewf
cp: cannot create regular file `/sbin/mount.ewf': Permission denied
steve@jj:~/software/EWF$ sudo cp /sbin/mount.ewf
steve@jj:~/software/EWF$ sudo chmod +x /sbin/mount.ewf

== And that's it - ready to go:
steve@jj:~/software/EWF$ mount.ewf
Using libewf-20090506. Tested with libewf-20080501.
mount.ewf [options]

Note: This utility allows EWF files to be mounted as a filesystem containing a flat disk image. can be any segment of the EWF file. To be identified, all files need to be in the same directory, have the same root file name, and have the same first character of file extension. Alternatively, multiple filenames can be specified in different locations in the order to be reassembled.

ewf segment filename(s) required.

Once you get the tool installed, you can mount EWF images like this:

create a mount point...mkdir /mnt/suspect
mount.ewf -o ro badguyimage.E* /mnt/suspect

The raw image will now be mounted on /mnt/suspect, and you run your TSK tools against it. Nice! The kewl thing is that you can mount your external drive as RW, then mount the image as RO. This comes in handy if you are dumping unalloc with blkls, and you only have a 80 GB HDD in your Linux box (like me)...I use external drives for everything!