Log Analysis Shell Script

Mada_Milty · Nov 22, 2006

Good Day,

I'm running a Squid (v 2.6) proxy server on Ubuntu 6.06. This app generates a client access log in the format:
Code:
time elapsed--remotehost--code/status--bytes--method--URL--rfc931--peerstatus/peerhost type
where dashes separate fields. I've also attached this log for convenience. What I need to do is create a simple shell script that will analyze this log and produce a summary showing first, how long clients spent online. I'm currently researching how to do this (its been awhile since I've used Linux's file manipulation commands), but while I do so, I thought I would open a thread for advice and recommendations. I have a vague recollection of the sed and awk commands... am I looking in the right direction?

Thanks everyone...

TimW · Nov 22, 2006

http://manageengine.adventnet.com/products/firewall/firewall-reports.html

Not what you want ....still looking ...

http://dmoz.org/Computers/Software/Internet/Site_Management/Log_Analysis/Freeware_and_Open_Source/

Last edit time:
http://www.safesquid.com/html/portal.php?page=54

goldfish · Nov 24, 2006

If it were me I'd got for perl.

What sort of processing do you want to apply to it? Just tracking user session lengths? That would be reasonabley simple to do in perl, as opposed to a bash script which I recon would be quite complex.

Mada_Milty · Nov 27, 2006

goldfish said:

If it were me I'd got for perl.

What sort of processing do you want to apply to it? Just tracking user session lengths? That would be reasonabley simple to do in perl, as opposed to a bash script which I recon would be quite complex.
Click to expand...

That's what I was thinking, but hoping to avoid. It's been a good while since I've worked with PERL, and even then, I hardly touched it.... looks like back to the textbooks for me! I'm glad it's so similar to C++; I'm pretty decent with that, still....

goldfish · Nov 27, 2006

Appart from the fact that you can do regular expressions SO much easier

Mada_Milty · Nov 28, 2006

Okay, I've found my textbook sadly lacking!

I've learned how to open a file, and recurse all the lines...whoopi-do!

There's nothing here on pattern matching (which is what I really need to be able to extract the pertinent information from this file), so...does anyone have any good references on PERL? (currently looking at www.perl.com) I don't suppose I can embed regular shell commands?

Mada_Milty · Nov 28, 2006

Hmmmm! Here's a good one!

http://perldoc.perl.org/

goldfish · Nov 28, 2006

I would tend to agree with that sentiment

There are plenty of perl books out there - a "perl-monger" friend of mine wrote one

Mada_Milty · Nov 29, 2006

Code:
1164117990.680  11380 192.168.0.123 TCP_MISS/200 24160 GET http://www.asus.com/ - DIRECT/216.148.234.177 text/html
Any recommendations on how to extract the URL from these lines? It's variable length, but there is always a " -" at the end of it.

I'm trying some combination of the index and substr functions, but I'm having no luck...

goldfish · Nov 29, 2006

Regular expressions my friend

Lets see what we can do ....
Code:
 
if ($string =~ /(http:\/\/.+?) -/) {
    print $1;
}
Give that a try

Mada_Milty · Nov 29, 2006

Okay, sorry that I'm so new to this... correct me if I'm wrong here.

I'm trying to figure out this pattern you're trying to match...

$string is obvious, it's the current line of the file I'm reading... I'm just using the default $_

I see you have
"http:" - That much is clear to me
\/\/ - two forward slashes escaped by backslashes to get "http://"
. - concatenation operator so we can add to this string
+? - not too sure about this one... what's this do? Wildcard for any number of characters?

Next question: why is this part in brackets?

then you have the dash, and finally the pattern delimiter.

If this evaluates as true, then it's just going to print the match? (of course, I can add my own statements...)

goldfish · Nov 30, 2006

Ok, let me explain this for you

In a regex, . isn't the concatination operator. It means "any character, excluding newlines". The + is a numerator, which means match (whatever was before) one or more times. So .+ means match any character one or more times. But by default this will look for a greedy match, i.e. it will match as many characters as possible. The ? will stop this from happening. As such, as soon as it finds the next character (in this case a " "), it will stop.

The brackets group the part of the match you want. Otherwise you'd be using $_ which would give you the entire match, from http to the - . We want just the URL itself, so the brackets will load http://your-matched-url.com/ into $1.

So instead of getting:
Code:
http://your-url.com/ - 
You'll get
Code:
http://your-url.com/
Also it should be noted that the if statement should let you keep the $1 in a block, which makes things a bit easier. Otherwise the $1 will have the scope of the entire code which can get confusing.

In your case you might add $1 into an array or list, rather than printing it.

And also if you're going to run multiple regex's on the current line, it would be a very good idea to load $_ into a new variable. Sounds a bit silly but sometimes regex's will start behaving oddly if you're referencing $_ directly.

Log in or Sign up

Log Analysis Shell Script

Mada_Milty MajorGeek

Attached Files:

access.log

TimW MajorGeeks Administrator - Jedi Malware Expert Staff Member

goldfish Lt. Sushi.DC

Mada_Milty MajorGeek

goldfish Lt. Sushi.DC

Mada_Milty MajorGeek

Mada_Milty MajorGeek

goldfish Lt. Sushi.DC

Mada_Milty MajorGeek

goldfish Lt. Sushi.DC

Mada_Milty MajorGeek

goldfish Lt. Sushi.DC

Log in or Sign up

Log Analysis Shell Script

Mada_Milty MajorGeek

Attached Files:

access.log

TimW MajorGeeks Administrator - Jedi Malware Expert Staff Member

goldfish Lt. Sushi.DC

Mada_Milty MajorGeek

goldfish Lt. Sushi.DC

Mada_Milty MajorGeek

Mada_Milty MajorGeek

goldfish Lt. Sushi.DC

Mada_Milty MajorGeek

goldfish Lt. Sushi.DC

Mada_Milty MajorGeek

goldfish Lt. Sushi.DC

Useful Searches