Monday, May 10, 2010

Command Line Goodness Part III

Now...I am still going to be working with my local web logs for the sake of protecting NDAs and whatnot (that's for Harlan). So, I go to my Mozilla profile and run a strings on my places.sqlite...which is case you don't know, the command "strings" displays all printable characters from a file. To see the difference, run the "cat" command on the same file (provided you are using FireFox) and see what happens (HINT...DON'T do this at 0630 when your wife and kids are still sleeping!). Then, use strings against the same file...ahhhh...much better.

So I redirect the output to a file I simply called, "example_web_history.txt"...so my command looks like this...

>strings places.sqlite > example_web_history.txt"

Now that I have my working file, I decide that I only want to see hits on the keyword "Facebook". SO, using grep I can do this in one of two ways...

>cat example_web_history.txt | grep -i facebook
or
>grep -i facebook example_web_history

Now, I redirect to an output file called, "face_out.txt" and I have ONLY hits with the word "Facebook" somewhere in the URL.

In this post, I am going to show you how to use the "cut" command, so while this example is completely benign, it does illustrate how useful this command can be.

OK, so in the screenshot below, you have some information about my Facebook habits...ohhh...how interesting. But let's say for the sake of the post, that this is some important data that is surrounded by garble. Let's say I JUST want the numbers that come between the two equals "=" signs.

First step is I figure out some data reduction. In this example, I only want the lines with the word "photo" in them...which s quick line count shows me is 615 lines...still way too much data!

(C:\Documents and Settings\User\Application Data\Mozilla\Firefox\Profiles\fdbj8kp8.default>cat face_out.txt | grep photo | wc -l)

So now, I want to suck out my numbers that come between those equals signs.



I would use cut like this...

>cat face_out.txt | grep photo | cut -d= -f2

The -d is for "delimiter" and indicates what I want to use as my identifying mark of where to separate out the lines. Next, I tell it that the delimiter that I want to use is the equals "=" sign, and finally I tell it to print field number 2. The cut will literally CUT the lines into two fields...the one before my chosen delimiter (1) and the one after my chosen delimiter (2). So now I redirect my output to a file called, "just_nums.txt", but I still have some work to do to get JUST the numbers.

I see that I have a bunch of lines with the word "home" in it...so to get rid of those, I can use grep with the -v option, which tells it to show me everything that does NOT have those lines...it's called a inverse search. I can repeat this as many times as I like...

>cat just_nums.txt | grep -v home | grep -v feed | grep -v dopp | grep -v photo

my resulting output is only 262 lines, and is JUST numbers...nice data reduction and I didn't have to monkey around with spreadsheet or eyeball anything. As you can see, cut is a pretty kewl command with a lot of options. The key to being able to use it in your cases is KNOWING what you are going after. The more specific you can be, the more efficiently you can extract your data.

Pretty neat huh...

No comments:

Post a Comment