[TriLUG] Search Engine question

Phillip Rhodes mindcrime at cpphacker.co.uk
Sun Oct 15 21:28:26 EDT 2006


WA Brown wrote:
> Is there a "google" quality search engine out there? Seems to me if 
> some one was to start a search engine that was "google" quality and 
> did not keep records. there would be a lot of use for this.
>
You can certainly run your own Internet search engine locally if you'd 
like.  Heck, writing a basic web spider / search engine in Java or Perl 
can be done in a day (or less) using
available libraries.  But, unfortunately, it's a big hop from a minimal 
search engine to something like Google.  The first big gap between a 
Google and something
you run local is your ability to crawl and index pages.  For that you 
need oodles of bandwidth, gobs of storage space, and quite a bit of CPU 
time.  And even
if you managed to index as much of the web as Google or Yahoo, you still 
have the problem that Aaron mentioned, which is ranking the results.  A 
naive ranking
is easy, something like PageRank is much harder.

Realistically, you aren't likely to build a Google / Yahoo caliber 
search engine in your own environment, unless you are building a 
commercial search
engine intended to compete in that space, and have massive funding.  :-(

But if you're interested in taking a stab at building your own search 
engine, see <http://lucene.apache.org/nutch/>, 
<http://en.wikipedia.org/wiki/Nutch> and/or
<http://lucene.apache.org/hadoop/about.html>.

TTYL,

Phil



More information about the TriLUG mailing list