Georgia Tech at EMNLP 2013

EMNLP is one of my favorite conferences, so I’m very pleased that Georgia Tech’s Computational Linguistics Lab will have two papers to present.

Yi Yang and I have written a paper that formalizes unsupervised text normalization in a log-linear model, which allows arbitrary (local) features to be combined with a target language model. The model is trained in a maximum-likelihood framework, marginalizing over possible normalizations using a novel sequential Monte Carlo training scheme. Longtime readers may find some irony in me writing a paper about social media normalization, but if we want to understand systematic orthographic variation — such as (TD)-deletion — then an accurate normalization system is a very useful tool to have in the shed.

Yangfeng Ji and I have obtained very strong results on paraphrase detection, beating the prior state-of-the-art on the well-studied MSR Paraphrase Corpus by 3% raw accuracy. We build a distributional representation for sentence semantics, which we combine with traditional fine-grained features. Yangfeng’s key insight in this paper is to also use supervised information to compute the distributional representation itself, by reweighting the words according to their discriminability.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: