[TriLUG] Automatic Spell Checking in Linux

Michael Ansel michael.ansel at gmail.com
Mon Jul 21 12:22:43 EDT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hope I'm not responding after the fact, but just something to
remember: Aspell (and most other spell checkers for that matter) is
written to correct common *typing* mistakes, not *recognition*
mistakes. For example, its (probably) not going to realize that
1nternet (one instead of capital eye) should have been recognized as
Internet. Not exactly sure how you could fix that (rewrite the aspell
dictionary maybe?), but definitely something you'll want to consider
if things aren't working out correctly. Good luck!!

Michael

Brian Henning wrote:
> Hi Michael,
>
> Check out aspell:
>
> http://aspell.net
>
> It has a "suggestion" feature with various modes of operation; I'm
> not sure how you might turn it into a fully-automated process but
> there's probably an option to make it automatically choose the most
> likely replacement, and then pipe the output to a file.
>
> ~Brian
>
> -----Original Message----- From: trilug-bounces at trilug.org
> [mailto:trilug-bounces at trilug.org] On Behalf Of Michael Ham Sent:
> Wednesday, July 16, 2008 10:02 AM To: trilug at trilug.org Subject:
> [TriLUG] Automatic Spell Checking in Linux
>
> Hey guys,
>
> My project at work is about to start OCRing thousands of pages of
> stuff and I would like some way to run an input file of messed up
> OCR stuff and have it spell check and return the most likely
> spellings automatically.
>
> Think of it like Google where you type "Lnus Torvald" and it says
> "Did you mean Linus Torvald?"
>
> So basically, does anyone know of a program that will perform all
> of that stuff automatically for a text file and then allow it to
> output a new large chunk of text with the changes made.
>
> Any help would be appreciated!
>
> Thanks, Michael Ham
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEcBAEBAgAGBQJIhLfSAAoJEHlxmnp6j2qxbpUIAKVZCjKgk8BBvlx7/dp6vhWD
Z/4k4W8CbZ+AsORw6IcKe1jtS7AmIO/Qgd33h0kfUplUCByh78erFIW3Jy9pSEhD
zg8Dgt68wCQigvfQrEQOXT9cAGz6u76OK4YNwcCroAHZYgWnTWiHQvzrFqRSLzO6
RNQTk1s4CIBMRC39LuRiQiA6DRqTRO6s+ZkvLW/qsY8WE+xaN3EwfvOEo9X4O3FJ
N56hb8FpmCtVKoV+EGn41M1pAnSGT9kGgSL4J0IF9QI8owePTBSglAaGL6BsbFdg
MQNpZCON05OXoJF5PBG+/KpNRYzBlQDPE9iBNBymdCN+8nT4qOZeLO8TvpM0WYs=
=bShM
-----END PGP SIGNATURE-----




More information about the TriLUG mailing list