Georgia Tech at EMNLP 2013

EMNLP is one of my favorite conferences, so I’m very pleased that Georgia Tech’s Computational Linguistics Lab will have two papers to present.

Yi Yang and I have written a paper that formalizes unsupervised text normalization in a log-linear model, which allows arbitrary (local) features to be combined with a target language model. The model is trained in a maximum-likelihood framework, marginalizing over possible normalizations using a novel sequential Monte Carlo training scheme. Longtime readers may find some irony in me writing a paper about social media normalization, but if we want to understand systematic orthographic variation — such as (TD)-deletion — then an accurate normalization system is a very useful tool to have in the shed. http://www.cc.gatech.edu/~jeisenst/papers/yang-emnlp-2013.pdf

Yangfeng Ji and I have obtained very strong results on paraphrase detection, beating the prior state-of-the-art on the well-studied MSR Paraphrase Corpus by 3% raw accuracy. We build a distributional representation for sentence semantics, which we combine with traditional fine-grained features. Yangfeng’s key insight in this paper is to also use supervised information to compute the distributional representation itself, by reweighting the words according to their discriminability. http://www.cc.gatech.edu/~jeisenst/papers/ji-emnlp-2013.pdf

Advertisements