natural language processing blog: finite state methods
(Can you tell, by the recent frequency of posts, that I'm try not to work on getting ready for classes next week?)[This post is based partially on some conversations with Kevin Duh, though not in the finite state models formalism.]The finite state machine approach to NLP is very appealing (I mean both string and tree automata) because you get to build little things in isolation and then chain them together in cool ways. Kevin Knight has a great slide about how to put these things together that I can't seem to find right now, but trust me that it's awesome, especially when he explains it to you :).The other thing that's cool about them is that because you get to build them in isolation, you can use different data sets, which means data sets with different assumptions about the existence of "labels", to build each part. For instance, to do speech to speech transliteration from English to Japanese, you might build a component system like:English speech --A--> English phonemes --B--> Japanese phonemes --C--> Japanese speech --D--> Japanese speech LMYou'll need a language model (D) for Japanese speech, that can be trained just on acoustic Japanese signals, then parallel Japanese speech/phonemes (for C), parallel English speech/phonemes (for A) and parallel English phonemes/Japanese phonemes (for B). [Plus, of course, if you're missing any of these, EM comes to your rescue!]Let's take a simpler example, though the point I want to make applies to long chains, too.Suppose I want to just do translation from French to English. I build an English language model (off of monolingual English text) and then an English-to-French transducer
- Drawing an English sentence from the language model p(e).
- Picking a French sentence at random from GigaFrench, and drawing an English sentence from p(e|f), where p(e|f) is the composition of the English LM and the transducer.