[TriLUG] OT: Need a thinkpad power supply

Aaron S. Joyner aaron at joyner.ws
Wed Aug 24 11:07:15 EDT 2005


This is exactly the kind of solution I was looking for.  :)  Although,
Timothy doesn't seem to want the quarter (as evidenced by his lack of
links), so I'll just send 13 cents on over to Lee who came up with the
shortest post the soonest, and 12 cents to David for the longest.  The
comparison for the shortest between the two candidates comes up
something like this:
5      12      61 = Cygwin
3      15      83 = Pavlov's dog

The cygwin post wins in everything but line count.  Since shortest is an
incomplete description (shortest vertically when printed, fewest bytes,
fewest words, fewest bytes after maximum gzip compression - aka least
"information", etc), I'd go with the default of who ever wins the wc
character count.  For those not familiar with the above output, man wc.

Now for the dissection of Tim's post, for the curious.  I welcome his
commentary or additional comments on how many revs of that perl one
liner he went through before he got it counting right.  :)

Timothy A. Chagnon wrote:

>...Nasty fun with perl, just counting lines to get a good guess:
>$ mkdir joyner
>$ wget -nd -nH -P joyner -r -l 1 -A gz \
>-X Week-of http://www.trilug.org/pipermail/trilug/
>  
>
Get the files linked from the TriLUG archives, recursively down one
level, which end in .gz, and store them on the local disk (in "joyner").

>$ gunzip -c joyner/*gz >trilug.txt
>  
>
Decompress them all into a file called trilug.txt (there by creating a
single text file with all the posts to trilug, ever).

>$ perl -n -e 'if( /^From: / ){ if($count){print "$count\n";$count=0}
>if(/joyner/){$joyner=1;}else{$joyner=0}  }else{if($joyner){ if(/^Date: /||/^Subject:/){print;} if(!/^>/&&/[a-zA-Z]/){$count++;}} }' trilug.txt |perl -n -e 'chomp; if(/^Date/) {$d=$_;}else{if(/^Sub/){$s=$_;}else{print "$_ $d $s\n";}}'|sort -n -k1
>  
>
To deconstruct this, it helps to break it down from a one liner into
properly intended code, which would be commented something like this:

if( /^From: / ){ # If it's the start of a message...
   if($count) {   print "$count\n"; $count=0; } # Consider it the end of the 
                                           # previous message, print the count
   if(/joyner/) { $joyner=1; } # if the From line contains "joyner", mark it
   else {         $joyner=0; } # otherwise, clear that mark
}
else{ # if this is a line in a message...
   if($joyner) { # marked as written by me...
      if(/^Date: /||/^Subject:/) { print; } # print the date and subject headers
      if(!/^>/&&/[a-zA-Z]/)      { $count++; } # and count all the other lines
   }
} 

This ends the first script, and he runs that script across the trilug.txt file, which produces some output that's just line a Date: line, a Subject: line, and a line count.  He then runs this second script, using the previous script's output as it's input:

chomp; # clear off the newline character
if(/^Date/) { # If it's the date line
   $d=$_; # stick the line in a var $d
}
else{
   if(/^Sub/){ # if it's a subject line
      $s=$_; # stick it in the var $s
   }
   else{
      print "$_ $d $s\n"; # print the count, date, and subject lines on one line
   }
}

He then takes the output of that, and runs it through sort, in order to ... well.. I'll leave that up to the reader.

So who wants to point out potential points for optimization of his code?  Tim - care to comment on / condense / clean up anything?  :)

Aaron S. Joyner




More information about the TriLUG mailing list