Saturday, October 20, 2007

Linguaphile progress

I've added a couple of languages I've always wanted in Linguaphile: Vietnamese and Hebrew. Both have dictionaries which are too tiny to be useful and no grammar at all. But both represent specific challenges for Linguaphile.

I finally found out how to get the Linguaphile test page on SourceForge to keep a log. It now logs IP address, source & destination language, and all the words it failed to translate in order of frequency. Now I can track how much the test page is used, which languages are the most popular, and which words are most needed in the dictionary.

One problem for Linguaphile is finding text to translate from. To address this need I have created a new tool which grabs random pages from Wikipedia converted to plain text. It takes a language parameter and number of pages parameter. This will also be useful for word frequencies and other text analysis such as Markov chains.

