[TriLUG] Tracking File Downloads

Ken MacKenzie ken at mack-z.com
Sun Jan 25 10:43:25 EST 2015


I have a site architecture is:

CentOS 6.4 I think
nginx
mysql
Drupal 7 (installed from repos)

This site serves a podcast.  Kind of a new endeavor.  Well in this process
I have discovered tracking podcast users is rather a difficult ordeal.
Podcatchers of course make Google Analytics rather useless.

So what I have setup is goaccess and a script to parse through the nginx
logs and then work them through goaccess to build html reports that I can
access through Drupal.

But here is the other thing, I want to filter out failed downloads and
crawlers, podcatchers just scanning for new episodes but not downloading
them.  In theory this is as close as I could get to determining "listens"

So my grep string:

grep .mp3 combined.log|grep 200| grep -v HEAD|goaccess -a > report.html

Ok that is a pseudo version.

Does that seem sensible.  I eliminated HEAD requests from the report as I
noticed the podcatchers use that to confirm the file presence when parsing
the feed.

Thoughts?


More information about the TriLUG mailing list