Monday, November 9, 2009

Why Fuzzy Hashing is Really Cool

For years, computer forensic investigators have put a great deal of stock in the effectiveness of MD5 hashing. Now to quantify that statement, I mean specifically using MD5 hashes to identify known malicious files. The key word in that sentence is known, but let's take that one step further to add the word “unmodified” known files. One minor change to a file, and the MD5 hash is now completely different, rendering the investigators search totally ineffective. So, what's the answer? Easy, fuzzy hashing.

Hash comparisons are either a yes or a no – either the hash matches, or it doesn't. But, that does not mean that the files are not the same, it just means they are not exactly the same. I am going to use a simple example, that will illustrate exactly what I am talking about.

The photograph of Oklahoma State University wide receiver Dez Bryant below was taken from, “http://media.photobucket.com/image/dez%20bryant/imandyduckworth/DezBryant.jpg” on November 09, 2009.


Using MD5Deep, I took generated an MD5 hash for this picture:

b2cedc90072bacc43fdcc533ad4f24ad /home/cepogue/Pictures/DezBryant.jpg

Now, if you were an investigator, and you were going to search for that image of Dez based on the MD5 hash, you would only find it if the image were totally and completely identical to this original.

To show how easy it is to modify an image like this, I used Ghex to open the image and scrolled to the bottom of the content.


Note at offset 5879 (the last line), there are only four characters, which on the right translate to a blank space, a question mark, and two periods. Using Ghex, I am simply going to replace the blank space with a period.



Look at offset 5879 again in the figure above. I replaced the blank space with the period, changing that last line from "20 3F FF D9" to "2E 3F FF 2E". A very minor change. As you can see from the modified image of Dez below, there is no visible change to the image.


Again, using MD5deep, I calculated the MD5 hash of the image, and it is totally different from the first image.

Here is the unmodified image hash one more time:
b2cedc90072bacc43fdcc533ad4f24ad /home/cepogue/Pictures/DezBryant.jpg

Here is the modified image hash:
df3e3d942610781f9b9d0b41683c46db /home/cepogue/Pictures/DezBryant2.jpg

The hashes are not even close. So, if an investigator was performing a search for this image based on the MD5 hash, he would fail to find it.

So, if you are an investigator, you may be thinking, “Aw crap...now what?! So ALL of the hash comparisons I have been doing could have failed while the evidence was still present?”

The answer to question is, “Yes...if the images were modified in any way...yes they did.” But, there is hope, and that hope is called fuzzy hashing.

Since the one to one comparison of hash sets is obviously antiquated and inadequate, Jesse Kornblum of Mantech thought up a fantastic solution called fuzzy hashing. Using a tool called SSDEEP, you can generate hash values that can then be compared to other files producing a percentage in which the file matches other files!

Using SSDEEP, I generate an output file from the first image of Dez that looks like this:

ssdeep -b DezBryant.jpg
ssdeep,1.0--blocksize:hash:hash,filename
384:HEOV6N0/xFXSw0x2K+PLfNDOPK2TYWImaMsYLB3q60tL5DwpXe9hZ4ksJWoTNpyY:HEI9Xg7+P9yImaNk3qrDwpXe9gf5xkIZ,"DezBryant.jpg"

I simply redirected the output to a file named dez.hash.

Then, I use that file to compare to the second image of Dez:

root@Linux-Forensic1:/home/cepogue/Pictures# ssdeep -bm dez.hash DezBryant2.jpg
DezBryant2.jpg matches DezBryant.jpg (99)

As you can see from the output, these two images are 99% similar.

Using fuzzy hashing can efficiently and effectively help investigators to identify files that contain a high percentage of similarities. While the file may not be 100% exactly the same, as proven by my example, that does not necessarily mean that they are not the same image. This same theory can be used with really any type of file. An investigator can then take the files with the highest percentage of similarities and manually review those individual files.

SSDEEP is a free utility and can be downloaded from http://ssdeep.sourceforge.net/.

Tuesday, November 3, 2009

Mount_EWF and Ubuntu 9.04

***Props to Steven Venter of Trustwave UK for putting this together. I used this today, with some minor modifications.***


So, I was faced with the need to mount a EWF image on my Ubuntu box so that I could use some of the TSK utilities on the image. Below, is how to get a tool called, "mount_ewf" working with Ubuntu 9.04.

So here's a quick update on getting EWF mounting capabilities installed on a new Ubuntu install [in this case the 32-bit version of Jaunty Jackalope Ubuntu 9.04]

The libewf software is now available at:
http://sourceforge.net/projects/libewf/

The files I downloaded were:
steve@jj:~/software/EWF$ ls -1
disktype-libewf.patch
EWF_file_format.pdf
libewf-20080501.tar.gz
libewf-beta-20090506.tar.gz *** I changed this too...I did NOT grab this file***
mount_ewf-20080513.py


== Install the required build dependencies
-- the
required Debian packages in Ubuntu are: zlib1g-dev libssl-dev uuid-dev
$ sudo apt-get install zlib1g-dev libssl-dev uuid-dev

== Create Debian (.deb) packages to install
Since the downloads are now standard source code format, I tried to create Debian (.deb) packages using the guidance here: http://www.quietearth.us/articles/2006/08/16/Building-deb-package-from-source

***This took me awhile to get working properly, as the "how to" is kind of vague.

First off, let's install the necessary tools:
# apt-get install autotools-dev fakeroot dh-make build-essential

Next, take the tarball you downloaded, in this case libewf-20080501.tar.gz
uncompress the tarball
tar -xzvf
libewf-20080501.tar.gz
cd into the newly created directory
libewf-20080501

Now, you are going to use the dh_make utility to make the debian control files
dh_make -f /path/to/tarball <-- this is important. You have got to tell tool the location of the original tarball...presumably, just down one directory. In my case, I dropped my tarball into /usr/local/bin (which is where I drop all of my install files).

Then select "S" for single binary.

Then run the following: (this has to be done as root)
# dpkg-buildpackage -rfakeroot

Step 1: Install required dependency packages:
$ sudo apt-get install autotools-dev fakeroot dh-make build-essential

Step 2: Copy the source code tarball to /tmp and extract the contents there steve@jj:~/software/EWF$ cp libewf-beta-20090506.tar.gz /tmp/
steve@jj:~/software/EWF$ cd /tmp/
steve@jj:/tmp$ tar -zxf libewf-beta-20090506.tar.gz
steve@jj:/tmp$ cd libewf-20090506/
steve@jj:/tmp/libewf-20090506$

Step 3a: No need to make the debian control files, since they are already there [in the debian/ sub-folder]

Step 3b: Build the debian package:
steve@jj:/tmp/libewf-20090506$ sudo dpkg-buildpackage -rfakeroot
** this ended with the following output:
signfile libewf_20090506-1.dsc
gpg: WARNING: unsafe ownership on configuration file `/home/steve/.gnupg/gpg.conf'
gpg: skipped "Joachim Metz ": secret key not available
gpg: [stdin]: clearsign failed: secret key not available

dpkg-genchanges >../libewf_20090506-1_amd64.changes
dpkg-genchanges: including full source code in upload
dpkg-buildpackage: full upload (original source is included)
dpkg-buildpackage: warning: Failed to sign .dsc and .changes file
steve@jj:/tmp/libewf-20090506$

Step 3c: List the newly created files:
steve@jj:/tmp/libewf-20090506$ cd ..
steve@jj:/tmp$ ls -ld libewf*
drwxr-xr-x 15 steve steve 4096 2009-05-08 18:41 libewf-20090506
-rw-r--r-- 1 root root 2262 2009-05-08 18:42 libewf_20090506-1_amd64.changes
-rw-r--r-- 1 root root 177340 2009-05-08 18:42 libewf_20090506-1_amd64.deb
-rw-r--r-- 1 root root 511 2009-05-08 18:40 libewf_20090506-1.diff.gz
-rw-r--r-- 1 root root 826 2009-05-08 18:40 libewf_20090506-1.dsc
-rw-r--r-- 1 root root 810174 2009-05-08 18:40 libewf_20090506.orig.tar.gz
-rw-r--r-- 1 steve steve 809523 2009-05-08 18:22 libewf-beta-20090506.tar.gz
-rw-r--r-- 1 root root 222562 2009-05-08 18:42 libewf-dev_20090506-1_amd64.deb
-rw-r--r-- 1 root root 195290 2009-05-08 18:42 libewf-tools_20090506-1_amd64.deb

== Install the newly created .deb packages:
steve@jj:/tmp$ sudo dpkg -i libewf*.deb
Selecting previously deselected package libewf.
(Reading database ... 109479 files and directories currently installed.)
Unpacking libewf (from libewf_20090506-1_amd64.deb) ...
Selecting previously deselected package libewf-dev.
Unpacking libewf-dev (from libewf-dev_20090506-1_amd64.deb) ...
Selecting previously deselected package libewf-tools.
Unpacking libewf-tools (from libewf-tools_20090506-1_amd64.deb) ...
Setting up libewf (20090506-1) ...

Setting up libewf-dev (20090506-1) ...
Setting up libewf-tools (20090506-1) ...
Processing triggers for man-db ...
Processing triggers for libc6 ...
ldconfig deferred processing now taking place
steve@jj:/tmp$


== To use the mount_ewf script, need to install python-fuse:
steve@jj:/tmp$ sudo apt-get install python-fuse


== Create a mount.ewf executable in the /sbin directory and grant it "execute" permissions:
steve@jj:/tmp$ cd
steve@jj:~$ cd software/EWF/
steve@jj:~/software/EWF$ cp mount_ewf-20080513.py /sbin/mount.ewf
cp: cannot create regular file `/sbin/mount.ewf': Permission denied
steve@jj:~/software/EWF$ sudo cp mount_ewf-20080513.py /sbin/mount.ewf
steve@jj:~/software/EWF$ sudo chmod +x /sbin/mount.ewf


== And that's it - ready to go:
steve@jj:~/software/EWF$ mount.ewf
Using libewf-20090506. Tested with libewf-20080501.
Usage:
mount.ewf [options]

Note: This utility allows EWF files to be mounted as a filesystem containing a flat disk image. can be any segment of the EWF file. To be identified, all files need to be in the same directory, have the same root file name, and have the same first character of file extension. Alternatively, multiple filenames can be specified in different locations in the order to be reassembled.


ewf segment filename(s) required.
steve@jj:~/software/EWF$

Once you get the tool installed, you can mount EWF images like this:

create a mount point...mkdir /mnt/suspect
mount.ewf -o ro badguyimage.E* /mnt/suspect

The raw image will now be mounted on /mnt/suspect, and you run your TSK tools against it. Nice! The kewl thing is that you can mount your external drive as RW, then mount the image as RO. This comes in handy if you are dumping unalloc with blkls, and you only have a 80 GB HDD in your Linux box (like me)...I use external drives for everything!

Friday, October 9, 2009

SecTor 2009 - A Great Success


I just returned home from SecTor in Toronto, Ontario, Canada, and hats off to Brian Bourne and crew for putting on a great con!

With this only being the third year for the con, I honestly didn't quite know what to expect. However, after seeing what the volunteers (no paid staff) had put together, it was every bit as professional and organized as as con I have ever been to.

There were some outstanding talks this year, all of which will be available with full audio on the SecTor website under "Presentations". Some of the highlights for me were Roy Firestein's talk on "Crimeware", Jibran Ilyas and Nick Percoco's presentation of the "Malware Freakshow" (which is the same presentation that was given at DEFCON 17 this past year in Las Vegas, and Adam Laurie's (aka Major Malfunction) lunch keynote on, "The Day in the Life of a Hacker". This is not to say that all of the talks were not very very good, these are simply the ones I enjoyed the most.

If you were not able to attend this year, I highly recommend it! Very very very good con!

Monday, September 14, 2009

Babel Fish

I read an article this morning on Forensic Focus from the UK based company CY4OR detailing the emerging trend of technical data in the courtroom. The author of the article posed the questions of should there be a higher level of technical expertise required on juries in cases involving computers? After reading the article twice, and thinking about it, my answer is no (at least where the US court system in concerned), and here is why.

In the US juries are supposed to be comprised of "your peers". Now, while most folks in the US are technically aware, to think that they are "savvy" is a bit of a stretch. So finding a "peer" where most IT folks are concerned is going to be tough. Most people while some are very intelligent, are your stereo typical "end users". A good example is my pastor, Alex Himaya from the Church at Battle Creek. Alex has a MS and a Phd - a very educated and intelligent man. Well spoken, well traveled, and well respected both inside the Christian community and out. However, you put a computer in front of the man, and...well...he becomes an "end user". He can get around in Windows XP, he can do his work, but that's about it. Now that is not a slam in any way on Alex, however it shows that even people that have Phds are not any better with computers than your typical high school student (an in many cases the HS students are far better).

That puts folks like us, not just IT professionals, but computer forensic investigators, in the top 1% - 3% of computer users. We should be the upper tier of computer professionals, we should know both how the systems work and why they work that way. The technology should never be the limiting factor in our investigations. And if we run across something new or unfamiliar, we should be able to research it and figure it out in a very (like minutes to hours) short period of time.

Being a corporate investigator, over the years my customers have ranged from CEOs of fortune five companies to single location restaurant owners. I have delivered forensic reports to customers that have degrees in IT and have a pretty good understanding of what I am saying as well as people who know as much about computer science as Dunder Mifflin's Michael Scott! So what's the key to delivering a comprehensive yet understandable report? Your mom!

You think I'm joking...I'm not...your mom is the key! When you write your reports, do it in a manner that your mother could understand (if your mom is not available, any non-technical person you trust will suffice - unless of course your mom is a computer expert of some sort...then my example is blown and you will have to pick somebody else to help you with your report writing). Explain something that is technically difficult plainly and without being condescending.

For example, I have recently written a white paper on the top 10 reasons level 4 merchants are compromised. My target audience is small business owners whose primary concern is not computer security or PCI compliance, rather providing dry cleaning services, burrito plates, discount clothing, etc. In my white paper I break down technical concepts like egress filtering, secure data wiping, and port identification in a manner that my mother (no lie...I used her to help me write my paper) could easily understand. I used common terms and word pictures to illustrate technically advanced concepts clearly without making the reader feel st00p1t.

My forensic reports, like most, are broken down into sections - as I'm sure yours are (if they aren't, they should be). Doing this will enable you to address several different audiences in the same report. Your executives are likely just interested in the high level information - what happened, how, and how they can fix it. Technical or security staff members may be interesed in the specifics of what happened...ports, malware, theft, exfiltration, etc. Make sure you address each different audience ina clear and concise manner. I know I have said this before, but it won't hurt to say it again...DON'T be verbose for the sake of being verbose. Clear, concise, to the point and move on.

Now...how does this all tie together in a courtroom? Well, you are the SME. You have the technical knowledge and the jury does not. The key is not to throw technical terms and abbreviations at them to the point where they just tune you out and start wondering what's for lunch. Use common terms and analogies that they can easily understand. Jesus was a master at this! You don't have to be a Christian to appreciate how Jesus used the everyday to explain the things of Heaven to his disciples. In much the same way, you are doing the same thing. If you have to get froggy and break it down with some techie love, then fine, but make that your fall back, not your first option. Remember, your job on the stand is to get the jury to understand why the evidence you are presenting is relevant to the case, and how it proves whether something either happened to didn't happen, not to show them how smart you are.

Be the Babel Fish!

Wednesday, September 9, 2009

Autopilot?

A recent post on Forensic Focus got me thinking. Basically, someone asked if there was another tool like Harlan Carvey's RegRipper that could be used to validate their findings. After talking with Harlan at some length about this, we pretty much came to the same conclusion that there are a lot of folks out there who are stuck in the old school, running on auto pilot.

Let's get something straight from the get go here, I am totally for output validation when and where necessary. Since certain tools do things in certain ways, it may be important to use another tool that comes to that same result in a different way to validate that the first tool is not doing something jankity.

Case in point was the gig I had in which I was asked to determine if some office documents had been tampered with. Some tools use metadata to display chronological information while others use the OLE data. Some tools can extract chronological data without having to mount the image, others require the image to be mounted. The point here is that the tools do things in a slightly different way.

RegRipper parses registry hives. There was a funny post where a chap stated that RegRipper is not a registry viewer, so you can't mount the hives and have a "look around". While this is a true statement, I thought it was indicative of the "old school" of forensics. What are you going to look around for? Are you going to perform "Registry Analysis" with NO IDEA what you are looking for, why, or which keys do what? This is where the term "Auto Pilot" comes in. So many folks simply have blind reliance on their tools to do the work for them. They have no idea what the tool does, how it does it, and where the output in generated from. They just load, fire, and report...this tool did this...how many other tools can I get to do the same thing? Maybe by using 17 tools to take an MD5 hash, people will think I am really smart and KNOW that my MD5 hash is a good and proper MD5 hash!

What I am getting at here is that you should have a basic understanding of what the tools you are using actually do, and how they actually do that thing. I am no coder, so I could not pull apart regripper and tell you which lines to what, but I CAN read. Harlan has done a great job with documenting how regripper works and even allows you to write your own plugins! If you took about 30 mins and reviewed the documentation, you would know that regripper simply parses the data from the registry hives in a readable format. It takes the more complex keys (like those that are Rot13'd) and translates them into plain english. That's it...no smoke, no mirrors, no voodoo magic. If you want to validate your findings, get a hex editor and do it by hand.

There are a couple of takeaways here. First, understand your tools. Have at least a basic understanding of what they do and how they do it. Then you can make an educated decision if you need another tool to validate your findings. Second, don't be on auto pilot. Don't simply run a tool and then state in your report that 'Tool BLAH showed me BLAH." Instead, state what you are looking for, why you are looking for it, and THEN state what the findings were.

Remember YOU are the subject matter expert. Your case findings should be repeatable if another investigator took the same data and used the same tools. If you document your goals clearly, and the steps you took which brought you to your conclusions, you should never have a need to defend your tools.


Wednesday, September 2, 2009

SecTor 2009


I just found out last night that my paper on "Sniper Forensics" has been accepted to SecTor 2009! It will be a talk that shows the advantages of taking a focused approach to forensic investigations to include faster more accurate results, which means happier customers.

Also, Jibran Ilyas and Nick Percoco (fellow Trustwave teammates from the SpiderLabs) will be giving their "Malware Freakshow" presentation from DEFCON. It's awesome to have THREE speakers from Trustwave at one event like this!

If you are in CA, or just want to go to another security conference this year - I hope to see you there!

Tuesday, September 1, 2009

Plan the Work, Work the Plan

I have heard of investigators (and unfortunately, witnessed a few myself) that will simply go into a case without really knowing what they are looking for. They don't clarify expectations with the customer, don't think about what it is that they are trying to find, and end up just "looking for bad guy stuff". Can I just share with you what a monumentally horrible idea that is? If you don't know what it is that you are looking for, how will you ever known when you find it? This is why it is so critical to create an investigation plan BEFORE you start poking around in your data.

Creating an investigation plan is one of, if not the most important steps an investigator can take in preparation for a new case. It allows you to clearly outline what your objectives are and provides a framework for the direction of the entire case. All too often this critical step is skipped in the interest of time. What some folks don't realize is that by not having a comprehensive investigation plan, they are actually increasing the amount of time their case is likely to take.

The first question that needs to be asked at the onset of any case is, "what are my objectives". What are the goals of the case? What information does the customer want? What questions do they want answered? Once you have the specific items the customer wants to have addressed, reiterate them to ensure that there has not been a breakdown in communication somewhere.

"I am hearing that you want me to try and determine, X, Y, and Z. Is that correct?"

I know it may sound a bit juvenile, but really, everything hinges off the customer's expectations. So at the risk of misinterpreting those expectations, and failing to deliver what the customer has paid for, it is a necessary step. Ensure that both parties are "on the same sheet of music", so that when you deliver your final report you can state, "Hey...you asked me to find A, B, and C....HERE is A, B, and C".

This is where corporate investigators differ from our brethren in the law enforcement community...to a certain extent. We have a clear set of goals that our customers have paid for. They have the expectation that they will get answers to those questions. The SOW is signed, and we get to work and get them their answers in the time allotted by the contract.
In the law enforcement world, there are no timeframes and often no clear direction of what the goals are. Recently, I learned that most local, state, and federal agencies that deal with cyber-crimes are pushing out cases in anywhere from six months to three years! In that time, they may stumble upon three or four criminal activities perpetrated by the owner of the suspect system. They look under every rock, they search every crevice. They have the luxury of time (for the most part)...we do not.

Once the goals for our investigations have been established, we can apply the Alexiou Principle to further clarify our actions.

The Alexiou Principle states:
1. What question are you trying to answer?
2. What data do you need to answer that question?
3. How do you extract that data?
4. What does that data tell you?

Your questions need to be as specific as possible. You cannot simply say things like, "I want to find all signs of bad guy stuff", or "I want to find everything that this guy did wrong." Some good examples of well worded questions are:

1. How did the intruder gain access to the customer's network
2. What mechanism did the intruder use to gather customer data
3. How did the intruder get the stolen data off the customer's network

These can be answered clearly in with one sentence each.

1. The intruder gained access to the customer system by using a weak pcAnywhere password.
2. The intruder used a packet sniffer to detect and compile track data in transit.
3. The intruder used FTP to send files containing the stolen track data to his server.

There will obviously be much greater detail surrounding each question, however this is a good example of how you can be very precise in your answers. Don't take two paragraphs to say what you can say just as well in two sentences. Most customer's are not interested in verbosity, they just want to know what happened, and how.

Once you have your questions outlined, you can begin to search for the data that will provide you the answers. For example, if one of your questions is, "How did the intruder gain access to the customer's network" you are going to look in places that contain data about system access. You are NOT going to scan the machine for viruses, look for pornography, or check for rootkits. Why not? Because they have nothing to do with system access. You WOULD check in event logs, application logs (like pcAnywhere, or LogMeIn), firewall logs, ntuser.dat files, and the system and software registry hives.

With as much data that is in volatile memory, RAM dumps, and on system images, it's very easy to get overwhelmed - something referred to as "analysis paralysis". You have theories buzzing around in your head, "What if the attacker did this? What if he did that"? Don't fall victim to that kind thinking. Keep your hypothesis tied to the data. Let the data guide the direction of your case. Don't try to force the data to fit your ideas about the case.

We only have a limited time to deliver our final reports that clearly and concisely meet the customer's expectations. We do not have the luxury of time, and cannot possibly find everything that may be "wrong" with customer systems. We have been hired to answer questions...that's it. So answer them thoroughly, and in a manner that the customer can easily understand. If you stumble across something they have not asked (or paid for) then bonus...include it in the report as an additional finding, but don't go looking for them.

I have heard customers at the conclusion of a case state, "Why did you do X? I didn't ask you to do X. I asked you to do Y and Z! I want all of the money I spent on you finding X refunded to me. It was not in the contract, and I am not paying for it!" Also, I have been on the other side of that conversation in which a customer told me, "Why didn't you find Z? I wanted you to figure out Z!". To which I replied, "Hey...remember the SOW conversation we had, we outlined the goals of the investigation? Remember and you agreed to all of those items, and we put them in a contract...that you signed? You asked me to figure out A, B, and C...which I did...very clearly. If you want Z, that's fine...I will find Z, but we will need to add hours to the SOW." They didn't have any rebuttal because I MADE SURE to cover the expectations before sending over the SOW.

Develop your investigation plan based on what the customer wants. Restate their goals to them to ensure there have not been any miscommunications. Apply the Alexiou Principle to each of the goals, and get working - the clock is ticking.