Tool for identifying languages

Jeff Kinz jkinz at kinz.org
Tue Jan 17 12:57:54 EST 2006


On Tue, Jan 17, 2006 at 10:23:17AM -0500, Christopher Schmidt wrote:
> On Tue, Jan 17, 2006 at 09:34:06AM -0500, Jeff Kinz wrote:
> > Does anyone know of a tool that can determine which language a
> > chunk of text is written in? (Assume a few hundred words)
> 
> http://languid.cantbedone.org/
> http://languid.cantbedone.org/Language-Guess.tgz

Wow.  Unbelievable.  Thank you Chris.
> 
> -- 
> Christopher Schmidt
> Web Developer


Why I'm "wowed":

This tool appears to use some form of statistical analysis based on
how often certain three "character" strings appear.  Also, whitespace is
one of the characters.   Very nice, and thanks again to Chris.

Here's a few random lines of the English "strings" file:
t t                     45
 be                     46
ld                      47
e a                     48
rs                      49
 wa                     50
ut                      51
ve                      52
ll                      53




-- 
Jeff Kinz, Emergent Research, Hudson, MA.
speech recognition software may have been used to create this e-mail

"The greatest dangers to liberty lurk in insidious encroachment by men
of zeal, well-meaning but without understanding." - Brandeis

To think contrary to one's era is heroism. But to speak against it is
madness. -- Eugene Ionesco



More information about the Discuss mailing list