Saturday, October 20, 2007

Linguaphile progress

I've added a couple of languages I've always wanted in Linguaphile: Vietnamese and Hebrew. Both have dictionaries which are too tiny to be useful and no grammar at all. But both represent specific challenges for Linguaphile.

I finally found out how to get the Linguaphile test page on SourceForge to keep a log. It now logs IP address, source & destination language, and all the words it failed to translate in order of frequency. Now I can track how much the test page is used, which languages are the most popular, and which words are most needed in the dictionary.

One problem for Linguaphile is finding text to translate from. To address this need I have created a new tool which grabs random pages from Wikipedia converted to plain text. It takes a language parameter and number of pages parameter. This will also be useful for word frequencies and other text analysis such as Markov chains.

Friday, October 19, 2007

Wiktionary & Linguaphile

Well I didn't get very far with the Mexico trip and I didn't even mention the next trip to Central Europe and Vietnam...

Now I'm back in Sydney working and saving for the next trip and doing some programming in my spare time.

My 2 main projects are the English Wiktionary and my own machine translator project, Linguaphile.

Now what I want to achieve is using the community-built dictionary data from Wiktionary for the translation database in Linguaphile.