hyperpape wrote:I know Google Translate works statistically, by analyzing large corpora, but does anyone know more about how it works? Does it have some sort of underlying/universal grammar that it translates the text into and then uses that to translate it into the target language? (If it did, you'd have the amusing possibility of trying to translate English into English).
Last time i heard they were using vanilla prhase base translation, but with huuuge language models;
In those models there is no notion of grammar at all, you just have a dictionnary of sentence fragment where punctuations like. or , is treated just like grammatical words or noun .+ a "language model" which give the probability that a given sentence is good in the target language, and you run a minimizazion algo to find the les costly translation
an excellent site for explanations is
http://statmt.org/They might be doing more advanced stuff now, but at the start at least it was just the method described in this site with enormous compting power.
Those algos are notoriously bad to moves words far in a sentence (becasue to reduce computation you would typically hard-code a limit to the number of slots a translated word might move ), a big problem for example with german when you need to boot the verb at the end., So they might do something extra to fix that.
as for english-english translation we actually dit that to correct our rule base models:
you have a non statistical engine that gives you an unperfect translation, and you train a statistical model to translate from "machine translated" english to more human english. It works well in case were you want to translate text from a narrow domain with little bilinugal corpus available / the rule base translation deos the grunt work with domain specific dictionnaries and you polish the language statistically
The "universal grammar" is still a theorist dream as far as i understand. Some research try to statiscally construct a syntax tree in the source language and then transform it into the target language but i don't think that google does that (not sure though)
In theory, there is no difference between theory and practice. In practice, there is.