Saturday, October 20, 2007

Linguaphile progress

I've added a couple of languages I've always wanted in Linguaphile: Vietnamese and Hebrew. Both have dictionaries which are too tiny to be useful and no grammar at all. But both represent specific challenges for Linguaphile.

I finally found out how to get the Linguaphile test page on SourceForge to keep a log. It now logs IP address, source & destination language, and all the words it failed to translate in order of frequency. Now I can track how much the test page is used, which languages are the most popular, and which words are most needed in the dictionary.

One problem for Linguaphile is finding text to translate from. To address this need I have created a new tool which grabs random pages from Wikipedia converted to plain text. It takes a language parameter and number of pages parameter. This will also be useful for word frequencies and other text analysis such as Markov chains.

Friday, October 19, 2007

Wiktionary & Linguaphile

Well I didn't get very far with the Mexico trip and I didn't even mention the next trip to Central Europe and Vietnam...

Now I'm back in Sydney working and saving for the next trip and doing some programming in my spare time.

My 2 main projects are the English Wiktionary and my own machine translator project, Linguaphile.

Now what I want to achieve is using the community-built dictionary data from Wiktionary for the translation database in Linguaphile.

Sunday, March 11, 2007


  1. Contrary to the persistent myth, Japan is cheap! My hostel in Tokyo is cheaper than the hostel I work at in Sydney. Coffee flavoured milk is 1/3 the price I pay in Sydney!
  2. It can be very very difficult to find an ATM that accepts foreign cards or is open late at night.
  3. The order of books on the shelves of a bookshop is entirely inscrutable even if you know your kana and a few kanji.