[TriLUG] command line assistance, please

Kevin Hunter hunteke at earlham.edu
Tue Nov 27 17:11:34 EST 2007


At 4:04p -0500 on 27 Nov 2007, James C. Jones wrote:
> Thanks Alan. I tried yours but since I am now not sure of the email 
> domain name, I didn't get any more hits than with the earlier command
> line.

Since you don't know any characteristics that positively identify it,
you can methodically use criteria that positively identify not-it files.

There are multiple ways to do this, but any way that you do it, utilize
the *nix philosophy of specialization and piping.  A workflow that might
work for you.

Depending on just how large a search space you have, these commandlines
may start to be useful to you:

$ export GREP_OPTIONS="--color=always"
   # highlight what was found, and keep it in the pipeline

$ export LESS="-R"
   # tell less to show color (escape sequences)

$ find . -type f | \
                # list all files in search space
  egrep -v ".png$|.gif$|.jpg$" | \
                # remove some known "not-its"
  xargs cat | \
                # conCATenate files' contents
  sort | \
                # sort the output so that ...
  uniq | \
                # ... uniq can remove dup lines
  less          # let you view contents at your leisure

Once you begin to see patterns in files that definitely aren't "it" you
can munge this pipeline in any you need.  For instance, "Huh, my file
will for sure have an '@' character in it, so:

$ grep -r @ * | \
             # list all files in now-smaller search space
  sort | \
             # sort the output so that ...
  uniq | \
             # ... uniq can remove dup lines
  less       # let you view contents at your leisure

>From there you might realize that of the current result set, your file
won't have 'john', 'teresa', or 'beef' in it:

$ grep -r @ * | \
        # list all files in now-smaller search space
  egrep -iv "john|teresa|beef" | \
        # egrep == grep -E == use regular expressions
        # pipe inside quotes means "or"
        # -i says don't be case sensitive
        # -v says to remove matching lines
  sort | \
              # sort the output so that ...
  uniq | \
              # ... uniq can remove dup lines
  less        # let you view contents at your leisure

>From there you can keep adding things to the egrep line that clearly
aren't what you want.

The power of up-enter should not be underestimated.

HTH,

Kevin

P.S. It's easier to keep a conversation with multiple points going if
you respond inline rather than top-posting.



More information about the TriLUG mailing list