[TriLUG] intranet search engine recommendations

Jon Carnes jonc at nc.rr.com
Fri May 10 15:43:26 EDT 2002


I use HTDig and it's what runs on TriLUG, but we do run the full re-index 
every night.  (I'm sure there is a way around that...).  I've also run Namazu.

Namazu seemed to do a much better job with excel and word docs, but it was 
also much more complex to setup.  Plus the docs are translated english, and 
that makes for some head-scatching while you are doing the setup.

Mandrake defaults to using Medusa.  I don't know how far along that is in 
developement, but if Mandrake uses it, it must be promising.

Jon
 --- Original Message: Friday 10 May 2002 03:28 pm ---
> First, congratulations to the new board and thanks to those who served for
> the past year.
>
> Second, I'd like to ask for the group's recommendations on an intranet
> search (engine|tool) which runs on Linux and is suitable for a small to
> midsize intranet.  I've been experimenting with htdig (distributed with Red
> Hat Linux) but have run into some apparent limitations:
>
> 1)  Based on the most current information I could find, htdig cannot update
> an index for only modified files.  For example, if 50 of 25000 fil es are
> modified in the course of a day, I'd like to be able to update the index
> for only the modified files.  With htdig, I would have to repa rse and
> reindex all 25000 files just to get the 50 updates.
>
> 2)  htdig (and/or its external parsers) seem to have a very large memory
> footprint for xls, doc, and pdf files over a few MB in size.  Setting the
> max_doc_size to a small number (i.e. 500K) would cause most of our
> documents to be omitted from indexing.
>
> Any recommendations?  I'm especially interested in anything that allows
> indices to be updated on modified files without reindexing unchanged f
> iles.  I've looked at Google's product, but is quite costly.
>
> Thanks,
> Geoff
>
>
> _______________________________________________
> TriLUG mailing list
>     http://www.trilug.org/mailman/listinfo/trilug
> TriLUG Organizational FAQ:
>     http://www.trilug.org/~lovelace/faq/TriLUG-faq.html



More information about the TriLUG mailing list