it was written in php, and references three xml files:
- basic.xml - Ogden's basic 850 words,
 +international english words (nuclear, maccaroni)
 +simple compound words (bedroom, eyeball)
 +pronouns (DiMaggio, Rudolf, PMS)
this list (17,900 items) are words left untranslated.
- translate.xml - single word substitution.
very. context. sensitive.
I have used a lot of translations the Internet Dictionary Project's list of 1600.
My list is around 2,000 items.
You can edit this list as a wiki here.
- idiom.xml - multiple-word translation. ( 'back and forth' -> 'backwards and forwards', 'a cash cow' -> 'an easy way to make money')
as far as i know this is the only developer-friendly idiom substition list. I'm making it by hand, .. usually on the bus.
its around ~400 items.
Yup, it's pretty brute force. The only heuristics I've used is plurals.
as i am not a strong programmer (or sql'er omg), this project is, if anything, a function of the collaberative power of the internet.
There is alot of work left to do, on both the data and code.
it has no sentance-level analysis. I have a large list of words that would be translated if they werent ambiguous, like 'bass', which could be translated properly if the rest of the sentance is searched for musical or fish words.
email me.
I'm throwing around three ideas:
Word sense disambiguation -- popping homonyms of wordnet. Is 'novel' a book or something new? If someone would make a (free) disambiuator that is sufficiently careful, it would multiply the power of this program.
Dependent clause repair -- with POS tagging. What i've been told is that run-on sentences are what's tricky in english.
Manual undo collection-- Users click the translated term and it reverts back to its previous form. Could collect this data to improve problem translations..