Focal Curve

Fun With Server Logs

In addition to being a vital part of management and maintenance of a website, examining server activity logs provides me with oodles of geeky entertainment. It appeals to the inner statistician I try so hard to suppress. I like finding clues and tracking someone’s path through the site, and sometimes trying to sort out who they may be and what the hell they’re doing reading my blog. It’s kind of like those psychological profile tests where they show you a picture and ask you to make up a story about it. Erm. Not that I’ve ever been profiled, of course. I hate fava beans.

But alas, my beloved ShortStat hasn’t worked since I upgraded to WordPress 1.5. This is evidently my own fault, as others are using Strayhorn+ShortStat without a hitch. I’m sure it’s just something with where I’m adding the include line in my template, but I’ll deal with it some other time (eagerly awaiting the next release). Meanwhile, I’m using Owen Winkler’s BAStats plug-in and quite liking it. While ShortStat gives a great at-a-glance overview of recent activity, BAStats lets you drill down for more details, e.g., associating an IP with a user-agent and referrer as well as the page said IP viewed, thus building a simple profile of that particular visit. It’s educational fun for the whole family.

Bots

As my human readership is rather small, a high percentage of my daily traffic comes from crawlers and bots. Google indexes me daily, Yahoo, Jeeves and MSN come semi-daily, and there’s a flurry of activity from various blog-related bots every time I publish a new post (I ping Ping-O-Matic and they all come runnin’). I noticed a whole lot of hits (several per day) from something called “BecomeBot” which included a URL, http://www.become.com/site_owners.html. That page states:

Become.com uses automated crawling technology to identify shopping-related web sites for inclusion in our index. We are particularly interested in sites containing shopping-related research information such as buying guides, reviews, articles, specifications, forums, etc.

My site is not shopping-related, so Become was wasting their time and my bandwidth crawling my site so often. I finally added a line to my robots.txt file telling them not to bother, though it seems some naughty bots are spoofing Become’s agent string. I’ll be on the lookout for those.

Another prolific bot is Bloglines, which grabs my feeds several times a day, whether I’ve updated or not, and is kind enough to pass along the number of subscribers to a particular feed in its user-agent string. I’m up to 5, across all feeds combined, and only two of them are me!

Feeds

There are several folks subscribing to my feeds with their own clients, which honestly stuns me. I’ve seen a slowly growing frequency of agent strings from NetNewsWire, FeedDemon, Sage, et al. A few weeks back I briefly tried publishing summaries only (if you’re a subscriber you may remember that). While I obviously want people to visit my site and ooh and aah over the complex design that I passionately labored over for several hard minutes, syndication is here to stay and I decided it’s the content that really matters. Hence I quickly went back to full text, and I continue to include inline images with absolute URLs (I’m annoyed by broken images in feeds). I’ve toyed with the notion of publishing feeds for individual categories, but my posting is so infrequent I just don’t think that would accomplish anything.

Searches

The most popular search phrase leading people here is “Guerra Communications” and variations therof, due to my semi-infamous series of posts about them. Those posts also get tons of search hits for many of the throw-away domains showing up in the ads, thanks to their being mentioned in the comments. People also sometimes find their way here by Googling my name. While that doesn’t neccessarily mean they’re searching for me, I gotta say it does make me a wee bit paranoid and flattered at the same time. It’s amusing that my site outranks craigcook.com (to which I have zero connection, for the record).

Some of the more interesting search phrases from recent history:

  • robots dangerous yet helpful
  • “wipe the smile off his face”
  • cheesy, very cheesy
  • scumbag definition
  • tumble chef
  • comic book conservatism
  • flashy films with poor stories
  • out of your mind performance spice girls
  • Dr. Smith’s insults of the robot
  • LOOK AT THIS GEEK

Countries

While most of my traffic comes from the US, my single most dedicated reader gets his or her bandwidth from Pipex.com, a UK ISP. It would seem my honorary position as Britpack mascot has gained me a modest following “across the pond”, judging from the range of UK IPs hitting me fairly regularly (‘allo, chaps). Someone in Australia quietly checks my feed almost hourly (good onya, mate). A visitor from the Netherlands drops in a few times a week (groeten, vriend). There’s also someone in Mexico who spends a good bit of time perusing my little corner of the web here (hola, compañero). I can’t tell who these people are, whether they’re some of the folks I met at SXSW or complete strangers who came across my site on their own (if you know who you are, drop a comment). Either way, welcome to you all and thanks for reading.

3 Comments on 'Fun With Server Logs'

  1. KJ said:

    Hey Craig, Canada’s watching you, too. Be scared. Be verrrrry scared!!!

    1
  2. Craig said:

    Canada is not to be trusted.

    2
  3. Ken Burton said:

    I like what you are doing here. Google gave me a n address on yoyr site when I did a search on the admorecash thing.
    It was nice to find the truth! I will return.
    THANKS; Ken

    3